Skip to main content

Documentation Index

Fetch the complete documentation index at: https://septemberai.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

This page captures the Engine’s threat model: who we’re defending against, what they can do, what stops them, and what we accept as residual risk. It’s a living document. When the system changes, the threat model changes; update this page in the same PR.

Trust boundaries

Three boundaries: the Engine process, the sandbox, and the host.

Adversary model

We consider three adversaries.

1. Malicious end user

A user calls /execute with crafted prompts trying to make the agent do something it shouldn’t (exfiltrate other users’ data, persist a back door, escape the sandbox). What stops them:
  • Each Engine instance is single-tenant. There’s no other user’s data on this brain to exfiltrate.
  • The sandbox prevents most filesystem-escape attempts.
  • Permission prompts surface dangerous operations to the user, who is also the principal — so the attack model degenerates to “user attacks themselves,” which is mostly self-harm.
Residual risk:
  • The user can instruct the agent to call MCP connectors they own. We don’t try to prevent self-harm at this layer.

2. Malicious model output

The model — possibly steered by a prompt-injection in retrieved content or in tool output — emits tool calls trying to do something dangerous. What stops them:
  • The sandbox confines tool execution to the brain’s data directory and pre-approved paths in ALLOWED_ROOTS.
  • seccomp blocks dangerous syscalls.
  • landlock enforces filesystem ACLs.
  • Permission prompts halt destructive operations (rm, shred, writes outside allowed roots) and surface them as HITL.
  • Secret scanning (src/security/secret_scanner.py) scrubs likely secrets out of model output before it streams to the client.
Residual risk:
  • A clever model could chain allowed operations to produce a result we consider harmful. We rely on permission prompts at the human boundary for this class.

3. Malicious MCP server

A connected MCP server returns crafted output trying to inject into the prompt or steal credentials. What stops them:
  • Tool output is plain text from the agent’s perspective; it carries no authority.
  • Credentials are encrypted at rest with AD_ENCRYPTION_KEY and only decrypted to make outbound calls. They never appear in the prompt or the SSE stream.
  • The model is instructed (in the system prompt) to treat tool output as data, not instructions. This is best-effort, not a guarantee.
  • A circuit breaker disconnects MCP servers that error repeatedly.
Residual risk:
  • Prompt injection from tool output is an open problem industry-wide. We mitigate, we don’t eliminate.

Defenses by layer

Network

  • HTTPS terminated upstream (BAP / load balancer). The Engine itself serves HTTP and assumes TLS is already done.
  • Outbound: the Engine talks to LLM providers and (per-user) MCP servers. Outbound destinations are not currently allowlisted at the network layer in the default deployment; rely on the upstream firewall if your environment requires that.

Process

  • The Engine process runs as a non-root user inside its container.
  • The Engine never execs untrusted binaries. Tool execution happens inside the sandbox, in a separate process.

Sandbox

  • bubblewrap provides namespaced isolation: separate filesystem, network, IPC, and PID namespaces.
  • seccomp filter blocks dangerous syscalls (mount, pivot_root, most module operations).
  • landlock enforces filesystem ACLs at the kernel layer; even if a sandboxed process bypasses bwrap, landlock holds.
  • Permission prompts (src/sandbox/permissions.py) gate operations the static rules can’t classify as safe — destructive shell commands, writes to high-value paths.

Storage

  • Credentials encrypted at rest with Fernet using AD_ENCRYPTION_KEY.
  • The brain database is single-user; a process gets exactly one brain.
  • Migrations are versioned and run on startup; mismatched versions fail fast.

Auth

  • API key validated on every request (X-Engine-Key).
  • Hash-only deployment supported via ENGINE_KEY_HASH so plaintext never appears in the Engine’s environment.
  • Key rotation with overlap window prevents in-flight requests from failing across a rotation.

What we don’t defend

  • The host kernel. If the host is compromised, the Engine is.
  • Side-channel attacks against the model provider (e.g. timing leaks from streaming).
  • DoS by paying customers. The Engine doesn’t impose its own rate limit beyond what the upstream LLM provider does.
  • Determined social engineering of the human in the HITL loop.

Process

Threat-model changes happen through PRs to this file. Anything that materially expands the attack surface (a new public endpoint, a new external integration, a new credential type) needs a security review before merge.

See also

  • Authentication — how the API key enforces the auth boundary.
  • Sandbox — what’s inside the sandbox boundary in code.
  • Data handling — what we keep, where we keep it, and for how long.