This page traces a singleDocumentation Index
Fetch the complete documentation index at: https://septemberai.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
/execute request from the moment it arrives
at the Engine until the final SSE event is flushed. If you want to know
“who calls whom” inside the process, this is the page.
The journey of one request
Per-step detail
1–2. Receive and validate
src/server.py is a thin FastAPI app. It validates the API key against
ENGINE_KEY_HASH (or ENGINE_API_KEY), parses the body into an
ExecuteRequest, and finds or creates the channel state for the
task_id. If channel state already exists for this task and is
mid-execution, the request is rejected with 409.
The SSE stream opens here. Events emitted later in the lifecycle write
into this stream.
3. Coordinator allocates a slot
engine_core/coordinator.py manages a fixed pool of TaskSlots. Each
slot tracks one in-flight execution. The coordinator emits the initial
thread_lifecycle: started event into the SSE stream so the client
knows the run is live.
4. Context assembly
The Engine’s most subtle work happens here.context_engineering reads:
- The system prompt for the active agent.
- The user’s soul (one row from the brain).
- Recent conversation history for this task (
conversations,working_memory_log). - Relevant retrieved memory — episodes, knowledge facts, social-graph entries — pulled by hybrid search keyed on the latest message.
- Tool definitions from the utility directory, asset directory, and user’s skill catalog.
cached_snip
heuristics arrange the content so older, stable parts hit the prompt
cache; per-turn deltas trail behind.
5. The agent loop
agent_loop.run() calls herald.complete() with the assembled context.
Herald routes to the configured provider, opens a streaming connection,
and emits structured chunks back to the loop:
- Text deltas → SSE
text_delta. - Thinking deltas → SSE
thinking_delta. - Block boundaries → SSE
content_block_start/content_block_stop. - Tool use blocks → handed to the streaming tool executor.
- Stop reasons → end of iteration.
stop) or feed back into
itself (model called tools and the loop continues).
6. Tool dispatch
streaming_tool_executor runs tool calls in parallel up to
CLAUDE_CODE_MAX_TOOL_USE_CONCURRENCY. It routes by kind:
- Platform tool. Runs through
utility_directory.tool_runtime, which executes the actual operation (read file, run bash, etc.) inside the sandbox. The sandbox may emit apermission_promptHITL event before running. - MCP tool. Looks up the connection in
asset_directory, decrypts credentials, posts the call to the remote MCP server, surfaces the result. - Skill. Runs through the skill runner, which is typically a sub-call to herald with a different prompt template. Some skills invoke platform tools internally.
tool_call when issued and tool_result when complete.
Results feed back into the context for the next iteration.
7. Compaction (sometimes)
If the context approaches the safe token ceiling, thecompaction_orchestrator runs in-line. It collapses earlier turns into
summaries, preserves structured artifacts (file contents, tool outputs
worth keeping), and emits a compaction_event. The model sees a fresh
shorter context on the next iteration.
8. Termination
When the model returns a stop reason the loop accepts (end_turn,
stop_sequence, max_tokens), the loop concludes. The coordinator
writes a trajectories row with the full execution record, finalizes
working memory, and emits thread_lifecycle: completed.
If the model issued an hitl_request instead, the loop suspends. The
SSE stream remains open and emits heartbeats. When POST /hitl/respond
arrives, the loop resumes from the suspended state.
9. Stream close
The SSE stream closes. The coordinator releases the task slot. Channel state remains in the brain for the configured TTL so that a reconnecting client can replay events.10. Asynchronous learning
Later — on the schedule set byLC_BATCH_INTERVAL_HOURS — the
learning_centre.scheduler wakes up and processes recent trajectories.
It generates new episodes, consolidates knowledge, updates the soul,
and adds nodes/edges to the social graph. None of this happens in-line;
the next conversation benefits from yesterday’s batch.
Cross-cutting concerns
A few things touch many of the steps above:- Logging. Every component emits structured logs through
logging_config.pywith a request ID propagated fromserver.py. - Observability events.
raven.storewrites events toobservability_eventsfor later querying. - Quota and rate limiting. Herald’s
quota.pyshort-circuits on rate-limit errors and surfaces them asLLM_RATE_LIMITED. - Graceful shutdown.
infra/graceful_shutdown.pyregisters drain callbacks. On SIGTERM, the Engine finishes in-flight requests up to a deadline, then exits.
See also
- Components for what each subsystem does in detail.
- Streaming events for the events flowing through the SSE stream.
- The agent loop for the user-facing version of step 5.

