Skip to main content

Documentation Index

Fetch the complete documentation index at: https://septemberai.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

This page traces a single /execute request from the moment it arrives at the Engine until the final SSE event is flushed. If you want to know “who calls whom” inside the process, this is the page.

The journey of one request

Per-step detail

1–2. Receive and validate

src/server.py is a thin FastAPI app. It validates the API key against ENGINE_KEY_HASH (or ENGINE_API_KEY), parses the body into an ExecuteRequest, and finds or creates the channel state for the task_id. If channel state already exists for this task and is mid-execution, the request is rejected with 409. The SSE stream opens here. Events emitted later in the lifecycle write into this stream.

3. Coordinator allocates a slot

engine_core/coordinator.py manages a fixed pool of TaskSlots. Each slot tracks one in-flight execution. The coordinator emits the initial thread_lifecycle: started event into the SSE stream so the client knows the run is live.

4. Context assembly

The Engine’s most subtle work happens here. context_engineering reads:
  • The system prompt for the active agent.
  • The user’s soul (one row from the brain).
  • Recent conversation history for this task (conversations, working_memory_log).
  • Relevant retrieved memory — episodes, knowledge facts, social-graph entries — pulled by hybrid search keyed on the latest message.
  • Tool definitions from the utility directory, asset directory, and user’s skill catalog.
It composes them into a single ordered list of messages. cached_snip heuristics arrange the content so older, stable parts hit the prompt cache; per-turn deltas trail behind.

5. The agent loop

agent_loop.run() calls herald.complete() with the assembled context. Herald routes to the configured provider, opens a streaming connection, and emits structured chunks back to the loop:
  • Text deltas → SSE text_delta.
  • Thinking deltas → SSE thinking_delta.
  • Block boundaries → SSE content_block_start / content_block_stop.
  • Tool use blocks → handed to the streaming tool executor.
  • Stop reasons → end of iteration.
Each iteration may end the turn (model said stop) or feed back into itself (model called tools and the loop continues).

6. Tool dispatch

streaming_tool_executor runs tool calls in parallel up to CLAUDE_CODE_MAX_TOOL_USE_CONCURRENCY. It routes by kind:
  • Platform tool. Runs through utility_directory.tool_runtime, which executes the actual operation (read file, run bash, etc.) inside the sandbox. The sandbox may emit a permission_prompt HITL event before running.
  • MCP tool. Looks up the connection in asset_directory, decrypts credentials, posts the call to the remote MCP server, surfaces the result.
  • Skill. Runs through the skill runner, which is typically a sub-call to herald with a different prompt template. Some skills invoke platform tools internally.
Each call emits tool_call when issued and tool_result when complete. Results feed back into the context for the next iteration.

7. Compaction (sometimes)

If the context approaches the safe token ceiling, the compaction_orchestrator runs in-line. It collapses earlier turns into summaries, preserves structured artifacts (file contents, tool outputs worth keeping), and emits a compaction_event. The model sees a fresh shorter context on the next iteration.

8. Termination

When the model returns a stop reason the loop accepts (end_turn, stop_sequence, max_tokens), the loop concludes. The coordinator writes a trajectories row with the full execution record, finalizes working memory, and emits thread_lifecycle: completed. If the model issued an hitl_request instead, the loop suspends. The SSE stream remains open and emits heartbeats. When POST /hitl/respond arrives, the loop resumes from the suspended state.

9. Stream close

The SSE stream closes. The coordinator releases the task slot. Channel state remains in the brain for the configured TTL so that a reconnecting client can replay events.

10. Asynchronous learning

Later — on the schedule set by LC_BATCH_INTERVAL_HOURS — the learning_centre.scheduler wakes up and processes recent trajectories. It generates new episodes, consolidates knowledge, updates the soul, and adds nodes/edges to the social graph. None of this happens in-line; the next conversation benefits from yesterday’s batch.

Cross-cutting concerns

A few things touch many of the steps above:
  • Logging. Every component emits structured logs through logging_config.py with a request ID propagated from server.py.
  • Observability events. raven.store writes events to observability_events for later querying.
  • Quota and rate limiting. Herald’s quota.py short-circuits on rate-limit errors and surfaces them as LLM_RATE_LIMITED.
  • Graceful shutdown. infra/graceful_shutdown.py registers drain callbacks. On SIGTERM, the Engine finishes in-flight requests up to a deadline, then exits.

See also