Containers

The Engine deploys as a small set of cooperating containers. This page is the C4 Level 2 view — what runs, what they share, and what crosses the process boundary between them.

The picture

The containers

The Engine ships as one process and one database file. The “containers” in C4 sense are the logical units that could be split apart, even though we run them as one process today.

`engine` — the API server

A single Uvicorn process running src.server:app (FastAPI). Default port 8000. Stateless from the request’s point of view — all durable state lives in the brain database. Authenticates every request with X-Engine-Key. Inside this process, the major subsystems are:

HTTP layer — FastAPI routes, CORS middleware, Unicode sanitization.
Agent loop / coordinator — drives execution, manages task slots, handles compaction.
Herald — unified LLM abstraction across Anthropic, OpenAI, and Gemini. Adds streaming, prompt caching, quota tracking, token budgets.
Memory subsystem — episodes, knowledge, social graph, working memory, soul.
Asset Directory — MCP connector registry, credential vault (Fernet), OAuth flow, refresh scheduling.
Sandbox — process isolation for tool execution.
Learning Centre — background batch scheduler that consolidates trajectories into knowledge.

`brain.sqlite` — the database

A single SQLite file (default path: /data/brain.sqlite) loaded with two extensions:

sqlite-vec for vector search (1536-dim embeddings).
FTS5 for full-text keyword search.

The file holds 20+ tables managed by the migration runner under src/infra/migrations.py. Migrations are SQL files in migrations/ and run on startup. A single Engine process holds an async connection pool against this file (see src/infra/sqlite_pool.py). The brain is single-user. Multi-user deployments run multiple Engine processes, each with its own brain file.

External LLM provider

The Engine never embeds model weights. All inference is a network call to one of:

Anthropic — api.anthropic.com, used when LLM_PROVIDER=anthropic.
OpenAI — api.openai.com, used when LLM_PROVIDER=openai. Also used for embeddings regardless of the chat provider.
Google Gemini — generativelanguage.googleapis.com, used when LLM_PROVIDER=gemini.

Provider selection is a deploy-time decision, not a per-request one. Embedding calls always go to OpenAI’s text-embedding model in the current implementation.

External MCP servers

Optional. When the user connects a service (Slack, Gmail, etc.), the Asset Directory stores the OAuth credentials and proxies tool calls to the remote MCP server during execution. Per-user. Credentials encrypted at rest with Fernet using AD_ENCRYPTION_KEY.

How they talk

From	To	Protocol	Notes
Client	engine	HTTPS + SSE	All requests carry `X-Engine-Key`
engine	brain.sqlite	aiosqlite + sqlite-vec	Async pool
engine	LLM provider	HTTPS streaming	Streaming SSE / chunked
engine	MCP server	varies (HTTP / WebSocket / stdio)	OAuth or static credentials
engine	sandbox	unix pipes / signals	Subprocess with bwrap

Internal to the engine process, subsystems talk through:

Direct Python function calls.
An in-process event bus (engine_core/events.py) for the SSE stream.
The brain database for any state that has to survive a restart.

There is no internal queue, broker, or RPC system. Every internal call is in-process. This is a deliberate constraint: it keeps the Engine small, simple to reason about, and easy to deploy as a single container.

Deployment topologies

Single-node (default)

One Engine process, one brain file, one local sandbox. This is the model for local development and for solo users running their own Engine. The docker-compose.yml in the engine repo deploys this configuration.

Hosted, per-user

A control plane upstream creates one Engine process per active user. The upstream router (BAP — Backend Application Platform) holds the user-to- engine mapping and routes API calls. Each Engine sees only its assigned brain file. This is how the public Engine product runs.

Hybrid (in production)

A shared coordinator — bap-engine — provisions and manages many Engine workers. The coordinator handles lifecycle, health, quotas, and routing. Engines remain stateless from the coordinator’s perspective. See BAP Engine for the full docs; the source lives at bap-engine.

What lives where

Here’s the mapping from “subsystem” to “code path” so the rest of the docs can refer to a canonical location.

Subsystem	Path
HTTP server	`src/server.py`
Bootstrap & state	`src/bootstrap.py`
Configuration	`src/config.py`
Authentication	`src/auth.py`, `src/key_rotation.py`
Agent loop	`src/engine_core/coordinator.py`, `src/engine_core/agent_loop.py`
Compaction	`src/engine_core/compaction_orchestrator.py`
Channel state	`src/engine_core/channel_state.py`
Events	`src/engine_core/events.py`
Memory	`src/memory/`
Asset Directory	`src/asset_directory/`
Utility Directory	`src/utility_directory/`
Herald (LLM)	`src/herald/`
Sandbox	`src/sandbox/`
Learning Centre	`src/learning_centre/`
Infrastructure	`src/infra/`
Observability	`src/raven/`
Security	`src/security/`

Component-level detail for each of these lives in Components.

Architecture

BAP Engine

Engineering

The picture

The containers

`engine` — the API server

`brain.sqlite` — the database

External LLM provider

External MCP servers

How they talk

Deployment topologies

Single-node (default)

Hosted, per-user

Hybrid (in production)

What lives where

Architecture

BAP Engine

Engineering

Documentation Index

​The picture

​The containers

​engine — the API server

​brain.sqlite — the database

​External LLM provider

​External MCP servers

​How they talk

​Deployment topologies

​Single-node (default)

​Hosted, per-user

​Hybrid (in production)

​What lives where

The picture

The containers

`engine` — the API server

`brain.sqlite` — the database

External LLM provider

External MCP servers

How they talk

Deployment topologies

Single-node (default)

Hosted, per-user

Hybrid (in production)

What lives where