Architecture overview

The Engine is a headless, single-tenant agentic runtime. It exposes an HTTP API, runs an agent loop against a configurable LLM provider, executes tools inside a hardened sandbox, and remembers what happened in a per-user SQLite brain. Everything else — the UI, the user accounts, the billing, the routing across users — lives upstream of the Engine, not inside it. This page is the C4 Level 1 view: the Engine as a black box and what’s around it. For Level 2 (containers and the boundaries between them), see Containers.

The one-paragraph version

A client (a chat product, a CLI, a backend service) sends an HTTP request with a user message, an agent definition, and a session identifier. The Engine streams back a server-sent event stream of reasoning, tool calls, permission prompts, and final output. While that stream is flowing, the Engine talks to an LLM provider, possibly invokes external tools through the Asset Directory (MCP), reads and writes the user’s memory, and runs sandboxed commands. When the conversation ends, an asynchronous Learning Centre batch distills the trajectory into episodes and knowledge facts that make the next conversation cheaper and smarter.

System context

The four primitives

The Engine is organized around four primitives. If you understand these, everything else is detail.

1. The agent loop

The core loop. It receives a message, sends it to the LLM, receives a response (with possible tool calls), executes the tools, feeds the results back, and repeats until the model decides it’s done or asks for human input. The loop is implemented in engine_core/coordinator.py and engine_core/agent_loop.py. Everything else exists to feed this loop better, faster, or cheaper.

2. Memory

Three memory stores, all persisted in a per-user SQLite brain database, all queried through the same hybrid search (vector + keyword):

Episodes — what happened (events, conversations, outcomes).
Knowledge — what’s true (facts, rules, heuristics, with confidence and validity windows).
Social graph — who’s who (people, relationships).

Plus working memory for transient context inside one execution, plus a “soul” — the user’s core identity and learned patterns.

3. Tools

Tools come from three places:

Platform tools — built-in operations (read file, write file, bash, etc.) registered in the Utility Directory.
Asset Directory connectors — external services reached via MCP (Slack, Gmail, Notion, etc.), with per-user OAuth and encrypted credential storage.
Skills — composed prompts and small scripts that wrap a recurring task (summarize, fact-check, format).

All three execute through the same dispatch interface. The agent loop doesn’t know or care which kind of tool it’s calling.

4. The sandbox

Every tool that touches the system runs inside a sandbox. The sandbox combines bwrap (filesystem isolation), seccomp (syscall filtering), landlock (filesystem ACLs), and a permission prompt layer that can ask the user before doing anything dangerous. The sandbox is implemented under src/sandbox/ and is the difference between “AI agent that helps” and “AI agent that wrecks your machine.”

The execution lifecycle

A single user turn moves through these phases:

Receive. POST /execute arrives with a message and a task_id. The server validates the API key, finds or creates the channel state for this task, and starts the SSE stream.
Plan. If the request needs planning, the Engine runs a planner sub-agent against a charter — the user’s request expanded into a set of sub-goals.
Loop. The coordinator drives the agent loop: build context, call the LLM, parse tool calls, execute tools (in sandbox or MCP), append results, repeat.
Compact. When context grows past safe limits, the compaction orchestrator collapses older turns into summaries while preserving structured artifacts (file contents, tool results worth keeping).
Stream. Throughout, the server emits SSE events — text deltas, thinking blocks, tool calls, tool results, HITL prompts.
Persist. Working memory updates as the loop runs. Channel state snapshots periodically so the stream can be resumed if the client reconnects.
Close. When the model stops or the user pauses, the Engine writes a trajectory record. The Learning Centre will pick it up on its next batch to consolidate into long-term memory.

What the Engine is not

It helps to be explicit about boundaries.

Not multi-tenant. The Engine runs against one brain at a time. Multi- user isolation happens upstream — each user gets their own Engine instance (or their own data directory).
Not a UI. No HTML, no web interface, no admin panel. The HTTP API is the entire surface.
Not a model. The Engine runs against external LLM providers. It does not host weights.
Not a chat app. Conversation threading state is exposed, but the Engine doesn’t render chats, manage rooms, or handle user accounts.
Not a job scheduler. The Engine doesn’t queue background work for arbitrary callers. It does run a Learning Centre batch on its own schedule.

If you’re building something that wants those things, build it on top of the Engine. Don’t add it to the Engine.

Where to go next

Containers — the C4 L2 view: services, queues, databases, the boundaries between them.
Components — module-level structure inside each container.
Data flow — what calls what, end to end, for a single request.
Shared types — the contracts between modules.

Architecture

BAP Engine

Engineering

Architecture overview

The one-paragraph version

System context

The four primitives

1. The agent loop

2. Memory

3. Tools

4. The sandbox

The execution lifecycle

What the Engine is not

Where to go next

Architecture

BAP Engine

Engineering

Documentation Index

​The one-paragraph version

​System context

​The four primitives

​1. The agent loop

​2. Memory

​3. Tools

​4. The sandbox

​The execution lifecycle

​What the Engine is not

​Where to go next

The one-paragraph version

System context

The four primitives

1. The agent loop

2. Memory

3. Tools

4. The sandbox

The execution lifecycle

What the Engine is not

Where to go next