Pre-flight
Before any upgrade:- Read the Changelog. Identify breaking changes.
- Read the relevant migration guide.
- Snapshot the brain volume. This is your rollback insurance.
- Run regression evals on the current version. Baseline.
- Plan a rollout window. Even patch releases can shift behavior.
Standard upgrade — patch and minor versions
Patch (X.Y.Z → X.Y.Z+1) and minor (X.Y → X.Y+1) versions are backwards-compatible. Behavior may shift; the API and brain schema don’t break.Steps
/health returns ok and evals are green, you’re done.
Standard upgrade — major versions
Major versions (vN → vN+1) include breaking changes. You’ll typically need code changes (new request shapes, new env vars) plus the upgrade.Steps
- Read the migration guide. All of it.
- Update your application code to match the new API.
- Test in staging. Bring up the new Engine in a staging
environment with a copy of production’s brain. Confirm
/health, send representative requests, watch for regressions. - Run full evals on staging.
- Deploy with a rollout strategy. Canary first; flip the rest if the canary is clean.
- Monitor for the first hour. Look for elevated errors, latency spikes, unusual log patterns.
Canary rollout
For deployments that serve real users, deploy to a small percentage of users first:Per-user-engine model
If you run one Engine per user (the typical hosted model), pick 1–5% of users and route them to the new Engine version. Watch their session traces; if behavior is normal, expand.Single Engine
If you run a single Engine for shared internal use, you don’t have canary granularity. Deploy at a low-traffic time, watch closely, roll back if needed.Rolling back
When the new version is bad, roll back fast.Image rollback
MIGRATION_REQUIRED errors when you try to start the old version.
Brain rollback
If migrations broke the brain or the new version corrupted data:Migrations that aren’t reversible
Most Engine migrations are forward-only. They add tables, add columns with defaults, populate caches. They don’t drop or rename anything critical. If a migration is destructive (drops a column, transforms data), the migration guide will say so. For destructive migrations:- Always snapshot first. Treat the snapshot like an irreversible resource — keep it for at least a week post-upgrade.
- Don’t rely on rollback. A destructive migration’s reverse is “restore from snapshot,” which loses recent data.
Upgrade across many Engines
In a per-user-engine deployment with hundreds of Engines:Sequential rollout
Upgrade Engines one at a time, in a controlled order. Slow but safe. Use this for the first few hours of a new release.Parallel rollout
Upgrade in batches (10%, 50%, 100%). Faster, slightly riskier.Drain and restart
For each Engine: drain in-flight turns, restart with the new image. The graceful-shutdown path drains existing requests up to a deadline before exiting.Post-upgrade checklist
After every upgrade:-
/healthreturns ok. - Run regression evals, compare to baseline.
- Check error rate over the first hour.
- Check P99 latency over the first hour.
- Check cache hit ratio. A sudden drop suggests prompt structure changed.
- Check the logs for warnings the previous version didn’t emit.
- Spot-check a couple of high-value workflows manually.
See also
- Rollback — when an upgrade goes wrong.
- Migration guides — per-version specifics.
- Production deploy — the initial deploy this upgrade is changing.

