When you callDocumentation Index
Fetch the complete documentation index at: https://septemberai.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
POST /execute, you’re starting an agent loop. The loop is
the part of the Engine that turns a model — which only knows how to predict
the next token — into something that can do work over many steps. This
page explains what the loop does, in order, so you can build a mental
model of what’s happening on the server while you watch the stream.
The shape of one turn
A “turn” is one user message and the agent’s response to it. Inside one turn, the loop runs as many iterations as the model needs. A simple question might be one iteration; a complex task with tool calls might be twenty.Step by step
1. Receive
POST /execute arrives. The server validates the API key, finds or
creates channel state for the task_id, and starts the SSE stream. A
thread_lifecycle: started event goes out immediately so the client knows
the turn is live.
2. Load context
Before calling the model, the Engine assembles the prompt. Context is built from:- The system prompt — the agent definition (built-in or custom).
- The soul — the user’s core identity and learned patterns.
- Recent turns — the conversation history for this task.
- Working memory — transient context the agent has accumulated within this turn.
- Retrieved memory — relevant episodes, knowledge facts, and social graph entries pulled by similarity to the current message.
- Tool definitions — the catalog of tools the agent can call.
3. Call the model
The Engine sends the assembled context to the configured LLM provider (Anthropic, OpenAI, or Gemini) as a streaming call. Token deltas come back in real time and propagate to the SSE stream astext_delta and
thinking_delta events.
4. Handle the response
The model’s response is one or more content blocks:- A text block — a chunk of the answer. Streamed as
text_deltaevents; finalized ascontent_block_stop. - A thinking block (if extended thinking is on) — internal reasoning, streamed and finalized the same way.
- A tool use block — a request to call a tool. The block contains the tool name and the input arguments.
5. Execute tools
Tool execution depends on what kind of tool the model asked for:- Sandboxed tools — file read/write, bash, etc. Run inside
bwrap+seccomp+landlock. May trigger a permission prompt (HITL) before running. - Asset Directory tools — MCP connectors. The Engine looks up the user’s stored credentials, sends the request to the remote MCP server, and surfaces the result.
- Skills — composed prompts and small scripts. Some are pure prompts (run as a sub-call to the model); some run small scripts in the sandbox.
tool_call event when the model issues it and a
tool_result event when it completes.
6. Compact, if necessary
If context is approaching its safe limit, the compaction orchestrator runs in-line to collapse older turns into summaries. Compaction emitscompaction_event
SSE events and is invisible to the agent — the model sees a fresh,
shorter context.
7. Decide
After tool results come back, the Engine feeds them to the model and loops to step 3. The model can:- Continue with more text or another tool call → loop again.
- Emit a
stopreason → the turn is over. - Pause for HITL → the loop suspends until the user responds.
8. Finalize
When the model says it’s done, the Engine writes a trajectory record (used later by the Learning Centre), emits athread_lifecycle: completed event,
closes the SSE stream, and persists final working-memory updates.
Stop conditions
A turn ends when:- The model returns a stop reason of
stop_sequence,end_turn, ormax_tokens. - The model asks a HITL question and you don’t reply (the stream technically continues to emit heartbeats but no progress is made).
- An error fires that the loop can’t recover from (
errorevent). - The user cancels the task externally.
Cost and latency
The cost of a turn is dominated by:- The model call(s). The loop may run the model multiple times per turn.
- The tools called. Some tools (web search, code execution) take seconds.
See also
- POST /execute — the API surface.
- Streaming events — the events the loop emits.
- Permissions — how risky tool calls get gated.
- Defining tools — what’s in the toolbox.

