bap-engine is a single FastAPI process backed by PostgreSQL. It speaks HTTP to products on one side and the Docker API to Engine containers on the other. This page maps what’s inside.Documentation Index
Fetch the complete documentation index at: https://internal.september.wtf/llms.txt
Use this file to discover all available pages before exploring further.
The picture
The components
HTTP API (server.py)
A FastAPI app with five route groups:
- Engine lifecycle —
POST /engines/provision,DELETE /engines/{user_id},POST /engines/{user_id}/{start,stop, rotate-key}. - Discovery + admission —
GET /engines/{user_id},POST /engines/{user_id}/admit. - Fleet —
GET /engines,GET /status,GET /metrics. - Admin —
POST /products/register,PUT /products/{product_id}/policy. RequiresX-Admin-Key. - Health —
GET /health(no auth).
_get_product(), which hashes
X-Platform-Key, looks up the matching product, and rejects unknown
keys.
Registry (registry.py)
The CRUD layer over Postgres. Two main types:
Product— a tenant on the orchestrator. Has a slug, a hashed platform API key, and a JSON policy.Engine— one user’s Engine instance. Has a status, a port, a hashed engine API key, and a Fernet-encrypted plaintext key.
Lifecycle (lifecycle.py)
The state machine for an Engine. Valid states:
Discovery (discovery.py)
The product’s main entry point goes through here:
POST /engines/{user_id}/admit calls discovery.admit(), which:
- Asks
policy.check_admit()— rate-limit + quota. - Looks up the engine in the registry.
- If none and
auto_provision=trueand policy allows, provisions. - If sleeping and
auto_wake=true, wakes. - Returns
{admitted: true, engine: {url, api_key, ...}}or{admitted: false, ...}.
/engines/provision are explicit and skip the auto-policy.
Policy (policy.py)
Two checks:
- Quota. A product’s policy carries
max_engines: int. Provision fails if the count would exceed it. - Rate limit. A product’s policy carries
rate_limit_rpm: int. Admit fails if the per-product per-minute rate would exceed it.
PolicyEngine that reads the product’s policy
JSON. No external store; no Redis. The state lives in the registry
plus an in-memory rate counter.
Health (health.py)
A background loop that runs every ORCH_HEALTH_CHECK_INTERVAL_S
(default 30 s). For each running engine, in parallel up to a
semaphore of 50:
GET /healthagainst the engine’s URL withORCH_HEALTH_CHECK_TIMEOUT_S(default 10 s).- On 200, reset
health_failures, updatelast_health_at. - On failure, increment
health_failures. AfterORCH_HEALTH_MAX_FAILURES(default 3), mark the enginefailed. - Schedule auto-restart with exponential backoff:
min(ORCH_RESTART_BACKOFF_BASE_S * 2^attempt, ORCH_RESTART_BACKOFF_MAX_S). - After
ORCH_RESTART_MAX_ATTEMPTS(default 8), give up. The engine staysfaileduntil ops intervenes.
Backends (backends/)
Two implementations of EngineBackend (an abstract base):
docker_backend.py— production. Creates Docker containers, mounts the data volume, attaches to the Docker network, runs the configured engine image.subprocess_backend.py— dev/test. Spawnspython -m uvicorn src.server:appas a local subprocess. No Docker.
ORCH_ENGINE_BACKEND. See
Backends.
Audit (audit.py)
Every lifecycle action writes a row to audit_log: action, actor,
product/user/engine IDs, metadata, duration. Used for incident
investigation and compliance reporting.
Metrics (metrics.py)
Aggregates over the registry and audit log:
GET /status— count by state, unhealthy count, overall health.GET /metrics— provisions, crashes, restarts in the last hour; average boot time; lifetime totals.
What state lives where
| State | Where | Lifetime |
|---|---|---|
| Products | products table | Forever |
| Engines | engines table | Until destroyed |
| Port allocations | port_allocations table | Until engine destroyed |
| Audit | audit_log table | Forever (until pruned) |
| Rate-limit counters | In-memory | Process lifetime |
| Engine API keys (plaintext) | Never stored | Returned to caller, then encrypted |
| Engine API keys (encrypted) | engines.engine_key_enc | Until destroyed |
| Engine API keys (hashed) | engines.engine_key_hash | Until destroyed |
ORCH_MASTER_KEY).
What runs where
| Container | Image | Purpose |
|---|---|---|
postgres | postgres:16-alpine | Orchestrator state. |
orchestrator | bap-engine prod image | The FastAPI process. |
engine containers | september-engine:<version> | Per-user Engine instances. Started/stopped on demand. |
engine_net) so the orchestrator can reach engine /health
endpoints by container hostname.
Where it doesn’t go
Some choices the orchestrator deliberately doesn’t make:- Multi-region. All engines run on one host. For multi-region, you run multiple bap-engine deployments, one per region.
- Migration. No live migration of an engine from one host to another. To move a user, destroy the source engine and provision a fresh one (their brain is in the volume; mount it on the new host).
- Cross-product user reuse. A user_id is per-product. The
orchestrator does not know that user
u-123in product A is the same human as useru-123in product B.
See also
- Engine contract — the API surface between orchestrator and engines.
- Lifecycle — the state machine in detail.
- Backends — Docker vs subprocess.
- Health — auto-restart and the loop.

