Data handling

Knowing what data lives where is the foundation of every conversation about privacy, compliance, deletion, and incident response. This page catalogs the Engine’s data footprint and the policies around it.

Where data lives

Data	Where	Encryption at rest
Conversation history	Brain SQLite, `conversations` + memory tables	Volume-level (file system)
Episodes, knowledge facts, social graph	Brain SQLite, `episodes_`, `knowledge_store_`, `social_graph_*`	Volume-level
Working memory	Brain SQLite, `working_memory_log`, TTL-bound	Volume-level
Soul	Brain SQLite	Volume-level
Trajectories	Brain SQLite, `trajectories`	Volume-level
MCP credentials	Brain SQLite, `connections.credentials`	Application-level (Fernet via `AD_ENCRYPTION_KEY`)
Channel state snapshots	Brain SQLite, `channel_state_snapshots`, TTL-bound	Volume-level
Observability events	Brain SQLite, `observability_events`	Volume-level
Engine logs	stdout → log shipping pipeline	Pipeline-dependent
Config (incl. secrets)	Container env, secret manager	Secret-manager-dependent

The brain SQLite file is the single most important storage element. Volume-level encryption is provided by the underlying storage (EBS, PD-SSD, or whatever your infrastructure offers). The MCP credentials specifically are encrypted at the application layer with Fernet. Even if the volume’s at-rest encryption is compromised, MCP credentials remain encrypted.

What we store

For each user:

Identity

A soul object summarizing the user’s core identity, values, and patterns.
Social graph nodes for people the user has mentioned.

Activity

Every /execute call’s message and the agent’s response, in conversation history.
Tool calls and results.
Permission decisions.

Inferences

Episodes summarizing past events.
Knowledge facts inferred from observations.
Social graph edges inferred from conversations.

External access

MCP server connections and their (encrypted) credentials.
Active scopes per connection.

What we don’t store

The raw data passing through MCP servers (e.g. the body of every Slack message the agent reads). Only what gets surfaced into context during a turn.
LLM provider responses beyond what’s in conversation history. We don’t keep a separate “model call log.”
User credentials for the Engine itself (we keep only the hash, if using ENGINE_KEY_HASH).
Plaintext API keys for LLM providers (those live in env, not in data).

Retention

Data	Default retention	Configurable
Conversation history	Indefinite	Yes (manual cleanup)
Long-term memory	Indefinite (with confidence aging on knowledge facts)	Yes
Working memory	TTL — typically a few hours	Yes (`CHANNEL_STATE_TTL_*`)
Trajectories	Indefinite by default	Recommended cleanup at 30 days
Channel state	TTL (3 h active, 72 h HITL)	Yes (`CHANNEL_STATE_TTL_*`)
Observability events	Indefinite by default	Recommended cleanup at 90 days
MCP credentials	Until disconnected or expired	Manual revocation via `DELETE /assets/connections/{ref_id}`
Engine logs	Pipeline-dependent	Yes

The Engine doesn’t auto-clean today. A periodic cleanup script is recommended for production deployments. See Database.

Per-user isolation

The Engine is single-tenant. Each instance has one brain. There is no multi-user data in a single brain. For multi-user products, isolation is achieved by running separate Engine processes (and separate brains) per user. The upstream router (BAP) holds the user-to-engine mapping and ensures requests route to the correct Engine. If a routing layer makes a mistake and routes user A’s request to user B’s Engine, user A would see user B’s memory. This is a routing-layer bug; the Engine has no defense against being addressed incorrectly. It trusts that the request authenticated against this Engine’s API key belongs here.

PII

Memory unavoidably contains PII:

The user’s name, in the soul and in social-graph nodes.
Names of people the user mentions, in social-graph nodes.
Email addresses, phone numbers, addresses if the user shares them.
Quotes from emails, messages, or documents the user processed through the agent.

For deployments handling regulated data:

Encrypt the volume at rest.
Restrict access to the brain volume to the Engine’s service account only.
Don’t ship logs containing user content to third-party log pipelines.
Consider redaction at the log level for sensitive fields.

Deletion

One user’s data

Wipe their Engine’s brain volume:

docker compose down --volumes  # if running per-user-engine in compose
# or
kubectl delete pvc engine-data-<user>  # if running in k8s

Then bring up a fresh Engine. The user starts with empty memory.

One conversation thread

There’s no API for this today. Direct SQL:

DELETE FROM conversations WHERE thread_id = '...';
DELETE FROM working_memory_log WHERE thread_id = '...';
DELETE FROM trajectories WHERE thread_id = '...';
DELETE FROM channel_state_snapshots WHERE thread_id = '...';

One specific memory item

Direct SQL on episodes, knowledge_store, or social_graph_*. The embeddings tables (*_vec) cascade delete via foreign-key triggers configured in migrations. For a complete deletion, including:

The user’s brain.
Backups containing their data.
Log lines containing their data.

Run the procedure:

Wipe the brain volume.
Mark backup snapshots covering the user’s data for deletion at the next lifecycle cycle.
Submit a redaction request to the log pipeline (most providers support this; runs over hours-to-days).
Confirm deletion with the user via your application’s user-facing compliance flow.

Incident response

If the brain is compromised:

Take the affected Engine offline. Stop the container; preserve the volume for forensics.
Revoke MCP credentials. Delete every connection in the affected brain so the credentials in the volume become inactive.
Rotate all secrets that touched the affected environment.
Investigate. What was compromised? What was extracted?
Communicate. Notify the user and (if required by jurisdiction) regulators.
Restore from a clean snapshot if appropriate, or stand up a fresh Engine with a clean brain.

See Threat model and the Engine’s SECURITY.md.

Compliance posture

The Engine itself is a building block. Compliance is achieved at the deployment level:

GDPR. Right-to-erasure flows above. Data minimization via retention policy. Encryption at rest.
India DPDP. Same as GDPR plus data localization for Indian users (host the Engine in an Indian region).
SOC 2. Volume encryption + secret management + audit logging (observability_events).
HIPAA. Not a default fit; the Engine’s not designed for PHI. If required, additional controls would need to be added (BAA with the LLM provider, additional encryption, audit retention).

Local development

Deploy

Configuration

Infrastructure

On-call

Incidents

SLOs

Security

Data handling

Where data lives

What we store

Identity

Activity

Inferences

External access

What we don’t store

Retention

Per-user isolation

PII

Deletion

One user’s data

One conversation thread

One specific memory item

Incident response

Compliance posture

See also

Local development

Deploy

Configuration

Infrastructure

On-call

Incidents

SLOs

Security

Documentation Index

​Where data lives

​What we store

​Identity

​Activity

​Inferences

​External access

​What we don’t store

​Retention

​Per-user isolation

​PII

​Deletion

​One user’s data

​One conversation thread

​One specific memory item

​Right-to-erasure (GDPR / DPDP)

​Incident response

​Compliance posture

​See also

Where data lives

What we store

Identity

Activity

Inferences

External access

What we don’t store

Retention

Per-user isolation

PII

Deletion

One user’s data

One conversation thread

One specific memory item

Right-to-erasure (GDPR / DPDP)

Incident response

Compliance posture

See also