Skip to main content

Documentation Index

Fetch the complete documentation index at: https://septemberai.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

A bad handoff is when the in-coming on-call gets paged for something the off-going was already working on, doesn’t know about it, and re-discovers everything from scratch. A good handoff prevents that.

The 5-minute handoff

Every Monday at 10:00 IST, the off-going and in-coming spend 5–15 minutes going through this list. Async by default (post in the rotation channel); synchronous if there’s anything genuinely complicated.

1. Open incidents

- INC-2026-04-25 (Sev2) — `/execute` p99 elevated for org_xyz
  - Status: mitigated by routing to fallback engine
  - Owner: @primary
  - Next step: root-cause investigation, postmortem due Friday
For each open incident: ID, severity, status, owner, next step.

2. Recent alert noise

- engine-cache-hit-ratio fired 4 times last week, all auto-resolved
  - Likely cause: catalog reload during regression eval runs
  - Action: ignored unless > 10 fires in a week
For each alert that fired more than once: name, count, suspected cause, action policy.

3. In-flight changes

- engine v2.3.2 rolling out tonight (Tuesday 22:00 IST)
  - New CHANNEL_STATE_TTL behavior
  - Rollback plan: docker tag swap to v2.3.1
  - Owner: @releases
- BAP migration to new orchestrator scheduled Friday
  - Affects: routing layer, not engine itself
  - Owner: @bap-team
Anything deploying or changing infrastructure during the in-coming week.

4. Pending action items

From recent postmortems:
- AI-2026-04-12-3: add p99 autorollback (in progress, due Apr 30)
- AI-2026-04-08-1: improve compaction observability (done, verify)
Action items the in-coming should be aware of. Especially “done, verify” items — confirm they haven’t regressed.

5. Recent weirdness

- Customer cust_abc reported intermittent 503s on Tuesday — no
  pattern found yet. Watch for repeat.
- Brain volume on engine-007 grew 2 GB last week, suspicious but
  within retention. Might warrant cleanup script tuning.
Things you noticed but didn’t fully investigate. The in-coming might see them again or might catch the pattern you missed.

6. Pager test

- Pager test: yes/no
- Last test: 2026-04-21 (passed)
Confirm the pager actually pages. Run a synthetic test page if it’s been more than a week.

After handoff

The in-coming acknowledges in the rotation channel:
Acknowledged handoff for week of Apr 28. Working: INC-2026-04-25 root cause; watching engine v2.3.2 rollout.
After acknowledgement, the off-going is off the hook. Their old incidents become the in-coming’s responsibility.

What goes wrong

Silent handoff

The off-going just disappears at 17:00 Friday and the in-coming wakes up Monday with no context. They get paged at 10:30, have no idea what INC-2026-04-25 is, and fumble for an hour. The fix: the handoff post is mandatory. No post, no end of rotation. The off-going stays on call until they post.

Verbose handoff

A 3000-word handoff that nobody reads is worse than a brief one that gets read. The 5-minute handoff structure exists because it fits. The fix: stick to the structure. Anything longer goes in the relevant postmortem; link from the handoff.

Handoff during an active incident

If a Sev1 is live during the rotation boundary, the off-going stays on until it’s resolved. The in-coming joins them. Don’t try to hot- swap leadership in the middle of a fire.

Going-on-vacation handoff

When you’ll be off for more than a normal weekend (vacation, conference):
  1. Update the rotation schedule in advance.
  2. Brief the engineer covering for you, especially if there’s active context.
  3. Set a Slack status that says when you’re back.
Same handoff structure, longer-form.

Handoff to a new team

If the rotation is changing teams (e.g., we’re spinning up a US-based night-shift rotation), do a multi-week overlap. The new team shadows for two weeks, takes secondary for two weeks, takes primary for two weeks while the old team stands down to backup. Three months later, the transition is complete.

See also