Upgrade

Upgrading the Engine is straightforward when you do it deliberately and catastrophic when you don’t. This page covers the standard upgrade path plus what to do when something doesn’t go to plan.

Pre-flight

Before any upgrade:

Read the Changelog. Identify breaking changes.
Read the relevant migration guide.
Snapshot the brain volume. This is your rollback insurance.
Run regression evals on the current version. Baseline.
Plan a rollout window. Even patch releases can shift behavior.

If the changelog flags a breaking change, the migration guide is required reading. Skipping it because “it’s just a patch bump” is how production goes sideways.

Standard upgrade — patch and minor versions

Patch (X.Y.Z → X.Y.Z+1) and minor (X.Y → X.Y+1) versions are backwards-compatible. Behavior may shift; the API and brain schema don’t break.

Steps

# 1. snapshot the brain
docker compose exec engine sqlite3 /data/brain.sqlite \
  ".backup /data/brain-$(date +%s).db"

# 2. update the engine image tag in docker-compose.yml or your manifest
# from: image: engine:2.3.0
# to:   image: engine:2.3.1

# 3. pull the new image
docker compose pull engine

# 4. restart with the new image
docker compose up -d engine

# 5. verify
curl -fsS "$ENGINE_URL/health" | jq

# 6. run regression evals
make eval

If /health returns ok and evals are green, you’re done.

Standard upgrade — major versions

Major versions (vN → vN+1) include breaking changes. You’ll typically need code changes (new request shapes, new env vars) plus the upgrade.

Steps

Read the migration guide. All of it.
Update your application code to match the new API.
Test in staging. Bring up the new Engine in a staging environment with a copy of production’s brain. Confirm /health, send representative requests, watch for regressions.
Run full evals on staging.
Deploy with a rollout strategy. Canary first; flip the rest if the canary is clean.
Monitor for the first hour. Look for elevated errors, latency spikes, unusual log patterns.

If the canary is clean for an hour, complete the rollout. If anything is off, roll back.

Canary rollout

For deployments that serve real users, deploy to a small percentage of users first:

Per-user-engine model

If you run one Engine per user (the typical hosted model), pick 1–5% of users and route them to the new Engine version. Watch their session traces; if behavior is normal, expand.

Single Engine

If you run a single Engine for shared internal use, you don’t have canary granularity. Deploy at a low-traffic time, watch closely, roll back if needed.

Rolling back

When the new version is bad, roll back fast.

Image rollback

# revert docker-compose.yml to the previous image tag
docker compose pull engine
docker compose up -d engine

If the new version applied a forward-only migration, the brain is now at a schema the old version doesn’t recognize. You’ll see MIGRATION_REQUIRED errors when you try to start the old version.

Brain rollback

If migrations broke the brain or the new version corrupted data:

# stop engine
docker compose stop engine

# restore from snapshot
docker compose exec engine sqlite3 /data/brain.sqlite \
  ".restore /data/brain-1714200000.db"

# start the old version
docker compose up -d engine

You’ll lose any data written between the snapshot and the rollback — typically minutes to an hour. For details, see Rollback.

Migrations that aren’t reversible

Most Engine migrations are forward-only. They add tables, add columns with defaults, populate caches. They don’t drop or rename anything critical. If a migration is destructive (drops a column, transforms data), the migration guide will say so. For destructive migrations:

Always snapshot first. Treat the snapshot like an irreversible resource — keep it for at least a week post-upgrade.
Don’t rely on rollback. A destructive migration’s reverse is “restore from snapshot,” which loses recent data.

Upgrade across many Engines

In a per-user-engine deployment with hundreds of Engines:

Sequential rollout

Upgrade Engines one at a time, in a controlled order. Slow but safe. Use this for the first few hours of a new release.

Parallel rollout

Upgrade in batches (10%, 50%, 100%). Faster, slightly riskier.

Drain and restart

For each Engine: drain in-flight turns, restart with the new image. The graceful-shutdown path drains existing requests up to a deadline before exiting.

docker compose stop --timeout 60 engine  # 60s drain window

Post-upgrade checklist

After every upgrade:

/health returns ok.
Run regression evals, compare to baseline.
Check error rate over the first hour.
Check P99 latency over the first hour.
Check cache hit ratio. A sudden drop suggests prompt structure changed.
Check the logs for warnings the previous version didn’t emit.
Spot-check a couple of high-value workflows manually.

If any of these go red, decide whether to roll back or fix forward within a small time window. Don’t let a bad upgrade live in production overnight.

Local development

Deploy

Configuration

Infrastructure

On-call

Incidents

SLOs

Security

Pre-flight

Standard upgrade — patch and minor versions

Steps

Standard upgrade — major versions

Steps

Canary rollout

Per-user-engine model

Single Engine

Rolling back

Image rollback

Brain rollback

Migrations that aren’t reversible

Upgrade across many Engines

Sequential rollout

Parallel rollout

Drain and restart

Post-upgrade checklist

See also

Local development

Deploy

Configuration

Infrastructure

On-call

Incidents

SLOs

Security

Documentation Index

​Pre-flight

​Standard upgrade — patch and minor versions

​Steps

​Standard upgrade — major versions

​Steps

​Canary rollout

​Per-user-engine model

​Single Engine

​Rolling back

​Image rollback

​Brain rollback

​Migrations that aren’t reversible

​Upgrade across many Engines

​Sequential rollout

​Parallel rollout

​Drain and restart

​Post-upgrade checklist

​See also

Pre-flight

Standard upgrade — patch and minor versions

Steps

Standard upgrade — major versions

Steps

Canary rollout

Per-user-engine model

Single Engine

Rolling back

Image rollback

Brain rollback

Migrations that aren’t reversible

Upgrade across many Engines

Sequential rollout

Parallel rollout

Drain and restart

Post-upgrade checklist

See also