Skip to main content

Documentation Index

Fetch the complete documentation index at: https://internal.september.wtf/llms.txt

Use this file to discover all available pages before exploring further.

The context window is your most expensive resource. Every token in context costs money, eats latency, and competes with every other token for the model’s attention. This page explains how the Engine assembles context, what you can do to keep it sharp, and what to do when it grows beyond what one round-trip can hold.

What the model sees

When the agent loop calls the model on a turn, it assembles context in this order:
  1. The agent’s system prompt. The persistent instructions for this agent.
  2. The soul. The user’s core identity and learned patterns.
  3. Retrieved long-term memory. Relevant episodes, knowledge facts, and social-graph entries pulled by similarity to the latest message.
  4. Conversation history. Past turns in this task, including the tool calls and results.
  5. Working memory. Notes, observations, and summaries the agent has accumulated within this task.
  6. Tool definitions. The catalog of tools the agent can call.
  7. The current user message.
The order matters for prompt caching. Stable parts go first; per-turn changes go last. This is what the cached_snip heuristics do behind the scenes.

Long-term vs. working memory

Two different stores serve two different purposes:
Long-termWorking
ScopeAcross tasks, across sessionsWithin one task
LifetimeIndefinite (with confidence decay)TTL-bound
PopulationLearning Centre batches; explicit writesAgent writes during execution
RetrievalSemantic + keyword searchDirect read
CostOne memory call per turnInline in context
The agent fills working memory automatically as it runs. Long-term memory grows from trajectories the Learning Centre processes asynchronously.

Designing for retrieval

The agent’s long-term memory only helps if the right thing comes back when the agent searches it. A few patterns:

Write structured episodes

When you write to episodic memory, give the entry a clear key, a description, and a value:
{
  "key": "user-prefers-pgsql-over-mysql",
  "description": "User prefers Postgres for new projects (mentioned 2026-04-12).",
  "value": "Use Postgres unless a project specifically requires MySQL.",
  "tier": "preference"
}
Bad episodes are vague: "description": "User said something about databases." They don’t surface in search.

Use knowledge facts for invariants

Knowledge facts are typed assertions with confidence and validity. Useful for things that are true now and may not be true forever:
{
  "subject": "deployment",
  "type": "rule",
  "content": "Production deploys go out Tuesday and Thursday only.",
  "confidence": 0.95,
  "valid_from": "2026-01-01",
  "valid_until": null
}
When the rule changes, supersede the old fact (set valid_until) and write a new one. Don’t edit in place.

Don’t write low-value entries

Memory you write but never retrieve is overhead. The Learning Centre batches are designed to filter noise, but explicit writes from your application bypass that filter. Be selective.

Compaction

When context grows beyond the safe limit, the compaction orchestrator (compaction_orchestrator.py) collapses earlier turns. Compaction is on by default (REACTIVE_COMPACT_ENABLED=true). Two strategies:

Summary collapse

The default. Older turns get replaced with a summary block: “Earlier in this task, we read 5 files, ran tests (passed), and discussed the authentication redesign.” The model knows roughly what happened without the full transcript.

Structured file extraction

When FEATURE_STRUCTURED_FILE_EXTRACTION=true (the default), file contents the agent has read are extracted and preserved separately so the summary collapse doesn’t drop them. The agent can re-reference a file’s content without re-reading the file.

Designing for compaction

You don’t usually call compaction yourself, but you do design for it:
  • Short tool outputs win. A tool that returns 50 KB of HTML survives compaction less gracefully than one that returns a 2 KB summary. Prefer summarizing at the tool layer.
  • Stable file references. When the agent reads a file, the path becomes the durable reference. Renaming files mid-task breaks the re-reference pattern.
  • Working-memory writes. When the agent decides “this is important,” it writes to working memory. Working-memory entries survive compaction better than raw turn content.

Cache-friendly structure

Prompt caching is the single biggest cost lever. To benefit:
  1. Keep the system prompt stable. Don’t include per-call data.
  2. Order context oldest → newest. The Engine does this; don’t fight it.
  3. Avoid unnecessary tool catalog churn. Reloading the catalog invalidates cache entries.
Cache hits typically save 70–90% of the input cost on the cached prefix. On a long agent loop, this is the difference between a 5¢ turn and a 1¢ turn. The Engine emits a usage event with cache_hit_tokens for every model call. Watch it.

Handling very long tasks

Some tasks naturally run for hundreds of turns — coding sessions, research projects, customer support threads. Patterns:

Compact at task boundaries

When a logical phase ends (“we finished the diagnosis, now we’re moving to the fix”), trigger an explicit summarization. This gives the agent a clean handoff between phases.

Split tasks

If a “task” is really three different jobs in sequence, give them different task_ids. Memory carries forward via long-term memory; the context for each task stays focused.

Use sub-agents

A planner agent that delegates to sub-agents (coder, tester, etc.) can keep each sub-agent’s context narrow. The planner has the high-level view; sub-agents see only what they need. See Multi-agent patterns.

What not to do

  • Don’t put long static content in the system prompt. Use retrieval or tool output. The system prompt should be instructions, not knowledge.
  • Don’t pass user history in message. The Engine maintains it. Sending past turns in message doubles them.
  • Don’t disable compaction in production. When context overflows, the model returns nothing useful. Compaction is the recovery path.
  • Don’t write to memory on every turn. The Learning Centre handles the firehose. Explicit writes should be deliberate, not reflexive.

See also