Threat model

This page captures the Engine’s threat model: who we’re defending against, what they can do, what stops them, and what we accept as residual risk. It’s a living document. When the system changes, the threat model changes; update this page in the same PR.

Trust boundaries

Three boundaries: the Engine process, the sandbox, and the host.

Adversary model

We consider three adversaries.

1. Malicious end user

A user calls /execute with crafted prompts trying to make the agent do something it shouldn’t (exfiltrate other users’ data, persist a back door, escape the sandbox). What stops them:

Each Engine instance is single-tenant. There’s no other user’s data on this brain to exfiltrate.
The sandbox prevents most filesystem-escape attempts.
Permission prompts surface dangerous operations to the user, who is also the principal — so the attack model degenerates to “user attacks themselves,” which is mostly self-harm.

Residual risk:

The user can instruct the agent to call MCP connectors they own. We don’t try to prevent self-harm at this layer.

2. Malicious model output

The model — possibly steered by a prompt-injection in retrieved content or in tool output — emits tool calls trying to do something dangerous. What stops them:

The sandbox confines tool execution to the brain’s data directory and pre-approved paths in ALLOWED_ROOTS.
seccomp blocks dangerous syscalls.
landlock enforces filesystem ACLs.
Permission prompts halt destructive operations (rm, shred, writes outside allowed roots) and surface them as HITL.
Secret scanning (src/security/secret_scanner.py) scrubs likely secrets out of model output before it streams to the client.

Residual risk:

A clever model could chain allowed operations to produce a result we consider harmful. We rely on permission prompts at the human boundary for this class.

3. Malicious MCP server

A connected MCP server returns crafted output trying to inject into the prompt or steal credentials. What stops them:

Tool output is plain text from the agent’s perspective; it carries no authority.
Credentials are encrypted at rest with AD_ENCRYPTION_KEY and only decrypted to make outbound calls. They never appear in the prompt or the SSE stream.
The model is instructed (in the system prompt) to treat tool output as data, not instructions. This is best-effort, not a guarantee.
A circuit breaker disconnects MCP servers that error repeatedly.

Residual risk:

Prompt injection from tool output is an open problem industry-wide. We mitigate, we don’t eliminate.

Defenses by layer

Network

HTTPS terminated upstream (BAP / load balancer). The Engine itself serves HTTP and assumes TLS is already done.
Outbound: the Engine talks to LLM providers and (per-user) MCP servers. Outbound destinations are not currently allowlisted at the network layer in the default deployment; rely on the upstream firewall if your environment requires that.

Process

The Engine process runs as a non-root user inside its container.
The Engine never execs untrusted binaries. Tool execution happens inside the sandbox, in a separate process.

Sandbox

bubblewrap provides namespaced isolation: separate filesystem, network, IPC, and PID namespaces.
seccomp filter blocks dangerous syscalls (mount, pivot_root, most module operations).
landlock enforces filesystem ACLs at the kernel layer; even if a sandboxed process bypasses bwrap, landlock holds.
Permission prompts (src/sandbox/permissions.py) gate operations the static rules can’t classify as safe — destructive shell commands, writes to high-value paths.

Storage

Credentials encrypted at rest with Fernet using AD_ENCRYPTION_KEY.
The brain database is single-user; a process gets exactly one brain.
Migrations are versioned and run on startup; mismatched versions fail fast.

Auth

API key validated on every request (X-Engine-Key).
Hash-only deployment supported via ENGINE_KEY_HASH so plaintext never appears in the Engine’s environment.
Key rotation with overlap window prevents in-flight requests from failing across a rotation.

What we don’t defend

The host kernel. If the host is compromised, the Engine is.
Side-channel attacks against the model provider (e.g. timing leaks from streaming).
DoS by paying customers. The Engine doesn’t impose its own rate limit beyond what the upstream LLM provider does.
Determined social engineering of the human in the HITL loop.

Process

Threat-model changes happen through PRs to this file. Anything that materially expands the attack surface (a new public endpoint, a new external integration, a new credential type) needs a security review before merge.

Local development

Deploy

Configuration

Infrastructure

On-call

Incidents

SLOs

Security

Threat model

Trust boundaries

Adversary model

1. Malicious end user

2. Malicious model output

3. Malicious MCP server

Defenses by layer

Network

Process

Sandbox

Storage

Auth

What we don’t defend

Process

See also

Local development

Deploy

Configuration

Infrastructure

On-call

Incidents

SLOs

Security

Documentation Index

​Trust boundaries

​Adversary model

​1. Malicious end user

​2. Malicious model output

​3. Malicious MCP server

​Defenses by layer

​Network

​Process

​Sandbox

​Storage

​Auth

​What we don’t defend

​Process

​See also

Trust boundaries

Adversary model

1. Malicious end user

2. Malicious model output

3. Malicious MCP server

Defenses by layer

Network

Process

Sandbox

Storage

Auth

What we don’t defend

Process

See also