Skip to main content

Documentation Index

Fetch the complete documentation index at: https://septemberai.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

This page collects the failure modes we’ve seen during local development, with the cause and the fix. If your problem isn’t here, the engine repo’s GitHub issues are the next stop.

Engine won’t start

Symptom: KeyError: 'LLM_API_KEY'

The Engine couldn’t find a required environment variable. Fix: confirm .env exists and has the required keys:
LLM_PROVIDER=anthropic
LLM_API_KEY=sk-ant-...
LLM_MODEL=claude-sonnet-4-7
OPENAI_API_KEY=sk-...
The OpenAI key is required even if LLM_PROVIDER is not OpenAI — embeddings always route through OpenAI today.

Symptom: container exits immediately with no output

Almost always a docker-compose env issue. Fix:
docker compose config | head -50
Check that the engine service has the env you expect. If .env is missing or malformed, compose silently falls back to defaults and the Engine bails on startup.

Symptom: migration error on startup

A migration file failed to apply, usually because the brain is at a schema version that the running Engine doesn’t recognize. Fix: If you’re on a fresh dev environment:
docker compose down --volumes
docker compose up engine
If you have data you care about:
docker compose exec engine sqlite3 /data/brain.sqlite '.schema'
Compare against the migrations the Engine ships. If the brain is ahead of the Engine (you downgraded), you’ll need to either upgrade the Engine or restore the brain from a backup taken before the downgrade.

Symptom: bind: address already in use

Port 8000 is already bound by another process. Fix: stop the other process or change the Engine’s port:
# docker-compose.yml
services:
  engine:
    ports:
      - "8001:8000"

Health check fails

Symptom: /health returns 503

A subsystem is unhealthy. Fix: look at the response body. The subsystems field tells you which one:
curl -fsS "$ENGINE_URL/health" | jq
Common subsystem failures:
  • database: error — SQLite can’t be opened. Check SQLITE_DB_PATH and the volume mount.
  • llm_provider: error — the configured provider is unreachable or rejecting auth. Try a curl directly to the provider.
  • asset_directory: error — the AD encryption key is missing or wrong. Set AD_ENCRYPTION_KEY (or remove it if you don’t use MCP).

Symptom: /health returns nothing, connection refused

The Engine isn’t running or isn’t bound to the port you’re hitting. Fix:
docker compose ps              # is the engine container up?
docker compose logs engine     # what does it say?

Authentication failures

Symptom: every request returns 401 INVALID_KEY

Mismatch between ENGINE_API_KEY and the value you’re sending. Fix:
# what does the engine think the key is?
docker compose exec engine env | grep ENGINE_API_KEY
# what are you sending?
echo $ENGINE_KEY
If you’re using ENGINE_KEY_HASH (production-style), make sure the hash actually matches your key:
echo -n "$ENGINE_KEY" | sha256sum

Streaming issues

Symptom: SSE stream shows nothing until the turn ends

Buffering. Either curl or your reverse proxy is holding the response until it’s complete. Fix:
  • For curl: add -N.
  • For requests (Python): set stream=True.
  • For nginx/ALB: set proxy_buffering off.
  • For corporate proxies: investigate; sometimes they’re hard-coded to buffer.

Symptom: stream ends abruptly with no thread_lifecycle: completed

The Engine crashed or was killed mid-turn. Fix: check docker compose logs engine for the crash. Then reproduce the request in isolation. If it’s reproducible, file an issue with the trace. In the meantime, you can resume with /execute/replay — but if the Engine truly crashed, the channel state may be stale.

Sandbox issues

Symptom: bash tool calls all fail with bwrap: ...

Bubblewrap isn’t available or the kernel doesn’t support the namespace configuration. Fix: The default Engine image includes bubblewrap. If you customized the image, make sure it’s still installed:
docker compose exec engine which bwrap
For Linux kernels older than 4.18, some namespace configurations aren’t available. Update the kernel or run on a newer host.

Symptom: every command needs permission

ALLOWED_ROOTS is set too narrowly. Fix: widen it:
ALLOWED_ROOTS=/data/brain,/data/workspace,/tmp/work
Operations inside these paths don’t trigger permission prompts.

Apple Silicon issues

Symptom: tests fail with seccomp probe errors

Some sandbox tests require x86 syscall semantics. The default test container forces linux/amd64:
test:
  platform: linux/amd64
QEMU emulation is slower but the tests pass.

Symptom: image build is glacial

Docker for Mac on Apple Silicon emulates x86 builds. To speed up:
  • Use the arm64 image when possible (the default Dockerfile builds natively on arm64 for the prod target).
  • For the test target, accept the slowness.

Performance

Symptom: turns are slow even with a fast model

Common causes:
  1. No prompt caching. Check usage.cache_hit_tokens — if it’s always 0, the system prompt is changing between calls.
  2. Compaction firing every turn. Check for compaction_event events. If they fire constantly, the task’s context is too long.
  3. Slow tools. Check tool_call / tool_result timing.
  4. Container CPU limit. Docker for Mac defaults are tight; bump in Docker Desktop settings.

Symptom: brain queries are slow

Memory tables grow unbounded. Periodic cleanup helps:
docker compose exec engine sqlite3 /data/brain.sqlite \
  'DELETE FROM trajectories WHERE created_at < datetime("now","-30 days");
   VACUUM;'
The Engine doesn’t auto-clean today.

When all else fails

  1. Check the logs. docker compose logs engine. The error is usually there.
  2. Reproduce on a clean brain. docker compose down --volumes && docker compose up engine. Eliminates “stale state” as a variable.
  3. Bisect. If something used to work and now doesn’t, what changed? Last config change? Last code change? Last upgrade?
  4. File an issue. Include the version, the env (sanitized), the request, the response, and the relevant log lines.

See also