Skip to main content

Documentation Index

Fetch the complete documentation index at: https://septemberai.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

When you call POST /execute, you’re starting an agent loop. The loop is the part of the Engine that turns a model — which only knows how to predict the next token — into something that can do work over many steps. This page explains what the loop does, in order, so you can build a mental model of what’s happening on the server while you watch the stream.

The shape of one turn

A “turn” is one user message and the agent’s response to it. Inside one turn, the loop runs as many iterations as the model needs. A simple question might be one iteration; a complex task with tool calls might be twenty.

Step by step

1. Receive

POST /execute arrives. The server validates the API key, finds or creates channel state for the task_id, and starts the SSE stream. A thread_lifecycle: started event goes out immediately so the client knows the turn is live.

2. Load context

Before calling the model, the Engine assembles the prompt. Context is built from:
  • The system prompt — the agent definition (built-in or custom).
  • The soul — the user’s core identity and learned patterns.
  • Recent turns — the conversation history for this task.
  • Working memory — transient context the agent has accumulated within this turn.
  • Retrieved memory — relevant episodes, knowledge facts, and social graph entries pulled by similarity to the current message.
  • Tool definitions — the catalog of tools the agent can call.
This step is also where prompt caching kicks in. The Engine arranges the context to maximize cache hits across turns.

3. Call the model

The Engine sends the assembled context to the configured LLM provider (Anthropic, OpenAI, or Gemini) as a streaming call. Token deltas come back in real time and propagate to the SSE stream as text_delta and thinking_delta events.

4. Handle the response

The model’s response is one or more content blocks:
  • A text block — a chunk of the answer. Streamed as text_delta events; finalized as content_block_stop.
  • A thinking block (if extended thinking is on) — internal reasoning, streamed and finalized the same way.
  • A tool use block — a request to call a tool. The block contains the tool name and the input arguments.
If the model returns tool use blocks, the loop runs them.

5. Execute tools

Tool execution depends on what kind of tool the model asked for:
  • Sandboxed tools — file read/write, bash, etc. Run inside bwrap + seccomp + landlock. May trigger a permission prompt (HITL) before running.
  • Asset Directory tools — MCP connectors. The Engine looks up the user’s stored credentials, sends the request to the remote MCP server, and surfaces the result.
  • Skills — composed prompts and small scripts. Some are pure prompts (run as a sub-call to the model); some run small scripts in the sandbox.
Each tool call emits a tool_call event when the model issues it and a tool_result event when it completes.

6. Compact, if necessary

If context is approaching its safe limit, the compaction orchestrator runs in-line to collapse older turns into summaries. Compaction emits compaction_event SSE events and is invisible to the agent — the model sees a fresh, shorter context.

7. Decide

After tool results come back, the Engine feeds them to the model and loops to step 3. The model can:
  • Continue with more text or another tool call → loop again.
  • Emit a stop reason → the turn is over.
  • Pause for HITL → the loop suspends until the user responds.

8. Finalize

When the model says it’s done, the Engine writes a trajectory record (used later by the Learning Centre), emits a thread_lifecycle: completed event, closes the SSE stream, and persists final working-memory updates.

Stop conditions

A turn ends when:
  • The model returns a stop reason of stop_sequence, end_turn, or max_tokens.
  • The model asks a HITL question and you don’t reply (the stream technically continues to emit heartbeats but no progress is made).
  • An error fires that the loop can’t recover from (error event).
  • The user cancels the task externally.

Cost and latency

The cost of a turn is dominated by:
  • The model call(s). The loop may run the model multiple times per turn.
  • The tools called. Some tools (web search, code execution) take seconds.
The Engine’s contribution to latency is small — typically tens of milliseconds for context assembly per iteration. Most of a turn’s wall time is upstream model inference plus tool execution. For optimization patterns, see Build with the Engine → Cost and latency.

See also