Skip to main content

Documentation Index

Fetch the complete documentation index at: https://internal.september.wtf/llms.txt

Use this file to discover all available pages before exploring further.

The orchestrator’s lifecycle layer doesn’t know how to start an engine. It delegates to a backend that implements EngineBackend. Two backends exist today: Docker (production) and subprocess (dev / testing). You pick which to run via ORCH_ENGINE_BACKEND.

EngineBackend interface

Every backend implements:
class EngineBackend(abc.ABC):
    async def create(self, engine_id: UUID, port: int,
                     env: dict, image: str) -> EngineEndpoint: ...
    async def start(self, engine_id: UUID) -> None: ...
    async def stop(self, engine_id: UUID, timeout: int = 30) -> None: ...
    async def destroy(self, engine_id: UUID) -> None: ...
    async def is_running(self, engine_id: UUID) -> bool: ...
Where EngineEndpoint is {host: str, port: int, container_id: str | None}. Lifecycle calls these in order: create (during provision) → start/stop/destroy (during the corresponding HTTP endpoint).

Docker backend

File: orchestrator/backends/docker_backend.py Use when: running in production. The default for ORCH_ENGINE_BACKEND=docker.

What it does

Each engine becomes a Docker container. On create():
  1. Pulls the image if missing (ORCH_ENGINE_IMAGE).
  2. Creates a named volume for the brain: engine-data-{engine_id[:12]}.
  3. Starts a container with:
    • The named volume mounted at /data (rw).
    • ORCH_CATALOG_MOUNT_PATH (host) mounted at /catalog (ro).
    • The Docker network ORCH_ENGINE_NETWORK (default engine_net).
    • Environment: ENGINE_KEY_HASH, SQLITE_DB_PATH=/data/brain.sqlite, CATALOG_DIR=/catalog, plus any vars in ORCH_ENGINE_ENV_PASSTHROUGH.
    • Restart policy: unless-stopped.
    • Container port mapped to host port (the one allocated by the orchestrator), bound to 127.0.0.1.

Why Docker

  • Isolation. One user’s Engine can’t see another’s filesystem.
  • Resource limits. CPU and memory ceilings per container; one rogue user can’t starve the rest.
  • Image pinning. ORCH_ENGINE_IMAGE is a specific tag; upgrades are deliberate.
  • Standard tooling. Logs, exec-into, restart, ps — all the usual Docker affordances.

Requirements

The orchestrator container must have:
  • /var/run/docker.sock mounted (or a remote Docker daemon configured via DOCKER_HOST).
  • The docker Python SDK installed (in the orchestrator image already).
  • Permission to pull images from the configured registry.
Your host needs:
  • Docker Engine running.
  • Enough disk for the engine-data-* named volumes.
  • Enough RAM to hold N concurrent engine containers.

What you lose

  • Cold-start cost. Each engine takes a few seconds to boot (migrations + catalog load). The orchestrator’s auto-wake helps amortize this.
  • Per-engine memory overhead. ~200 MB per container at idle. For thousands of users, this adds up; consider ORCH_IDLE_SLEEP_THRESHOLD_S to auto-stop sleeping engines if RAM is tight.

Subprocess backend

File: orchestrator/backends/subprocess_backend.py Use when: developing or running tests on a single machine without Docker. Set ORCH_ENGINE_BACKEND=subprocess.

What it does

Each engine becomes a local Python subprocess running:
python -m uvicorn src.server:app --host 0.0.0.0 --port {port} --workers 1
with environment variables passed in directly.

Why exists

  • Faster iteration. No image build, no container start. Useful during engine development.
  • CI without Docker-in-Docker. Tests can run subprocess engines inside a single CI runner.
  • Debugger access. pdb works; container barriers don’t get in the way.

Limitations

  • Not isolated. Every subprocess shares the host filesystem and network. Don’t run multi-tenant.
  • No restart support. start() is a no-op (warning logged); to restart, destroy and re-create.
  • No resource limits. A runaway subprocess can take down the host.
  • Different sandbox availability. The engine’s bash sandbox needs bubblewrap, which is harder to get right inline. Some engine features may not work outside Docker.
Use subprocess for unit tests and quick orchestrator-side development. Use Docker for everything else.

Switching

ORCH_ENGINE_BACKEND is read at orchestrator startup. To switch:
  1. Stop the orchestrator.
  2. Stop and destroy any in-flight engines.
  3. Update ORCH_ENGINE_BACKEND in the env.
  4. Restart.
You can’t mix backends in one orchestrator instance.

Future backends

The interface is stable, so additional backends are possible. Likely candidates if we need them:
  • Kubernetes Pod — for cluster-native deployments. Each engine becomes a Pod managed via the K8s API.
  • Firecracker microVM — for stronger isolation than containers.
  • Remote Docker — running engines on a different host than the orchestrator.
None are implemented today.

See also