Skip to main content

Documentation Index

Fetch the complete documentation index at: https://internal.september.wtf/llms.txt

Use this file to discover all available pages before exploring further.

The Engine streams everything. From the first text_delta to the final thread_lifecycle: completed, every meaningful state change emits as a Server-Sent Event. This page covers how to consume the stream well.

Why SSE

SSE is the right protocol for this surface because:
  • Server-pushed, like WebSocket, but over plain HTTP — works through every load balancer, proxy, and firewall that allows HTTPS.
  • Text-based, parseable with two regexes.
  • Reconnection is built into the spec; the browser’s EventSource handles it for you.
  • No back-channel needed. The agent loop runs on the server; you just read.
WebSocket would give bidirectional communication, but the Engine doesn’t need it: the agent’s only “incoming” message during a turn is the optional POST /hitl/respond, which is a separate request.

The simplest consumer

import json
import httpx

with httpx.Client(timeout=None) as c:
    with c.stream("POST", "http://localhost:8000/execute",
                  headers={"X-Engine-Key": KEY},
                  json={"message": "hi", "task_id": "demo"}) as r:
        for line in r.iter_lines():
            if line.startswith("data:"):
                event = json.loads(line[5:].strip())
                if "text" in event:
                    print(event["text"], end="", flush=True)
Three lines of substance. That’s a working consumer.

A complete consumer

A real consumer routes by event type, handles disconnections, and surfaces meaningful state to the UI. Here’s the shape:
def consume_stream(stream):
    event_type = None
    buffer = []

    for line in stream.iter_lines():
        if not line:
            # blank line ends an SSE message
            if event_type and buffer:
                handle(event_type, json.loads("".join(buffer)))
            event_type = None
            buffer = []
            continue
        if line.startswith("event:"):
            event_type = line[6:].strip()
        elif line.startswith("data:"):
            buffer.append(line[5:].strip())
The data: field can span multiple lines — accumulate until a blank line.

Event family routing

Group events into four families and handle them separately:

Lifecycle

thread_lifecycle, error — the meta-events. Use them to start, stop, and fail the UI’s “this turn is in progress” state.

Content

text_delta, thinking_delta, content_block_start, content_block_stop — the model’s output. Stream into your message UI.

Tools

tool_call, tool_result — agent activity. Surface as collapsed UI elements (“Searching the web…”, “Read 3 files”) that expand on click.

HITL

hitl_request, hitl_resolved — the agent is paused. Surface as a prompt the user must respond to before the turn continues.

Reconnection

If your client disconnects mid-stream, the Engine keeps running. To catch up:
GET /execute/replay?after=<unix_ms>
X-Engine-Key: <key>
The Engine replays every event emitted on the current task after the given timestamp. Pass 0 to replay everything available. The buffer’s lifetime is CHANNEL_STATE_TTL_ACTIVE (default 3 hours) for active turns and CHANNEL_STATE_TTL_HITL (default 72 hours) for HITL-paused turns.

Implementing resilient streaming

last_event_ts = 0

def stream_with_reconnect(task_id):
    while True:
        try:
            for event in raw_stream(task_id, after=last_event_ts):
                handle(event)
                last_event_ts = event["timestamp"]
        except (ConnectionError, ReadTimeout):
            time.sleep(1)
            continue
        else:
            break  # stream ended cleanly
The same task_id is the key. Each event has a timestamp field; track the last one you’ve seen. On reconnect, replay from there.

Heartbeats

The Engine emits a heartbeat event periodically — every 15 seconds by default. Use it to detect dead connections. If you don’t see one for 30+ seconds, your connection is probably gone; reconnect.
last_heartbeat = time.time()

def on_heartbeat(_):
    nonlocal last_heartbeat
    last_heartbeat = time.time()

def watchdog():
    while True:
        time.sleep(5)
        if time.time() - last_heartbeat > 30:
            close_and_reconnect()

Client-side rendering

A few patterns that work well:

Accumulating text

For the simplest UX, concatenate text_delta.text and re-render on each event. Modern UI frameworks debounce at 60 FPS, so even a chatty stream feels smooth.

Block-aware rendering

If the agent calls multiple tools and emits text between them, render each block as a separate UI element. content_block_start / content_block_stop give you the boundaries.

Tool affordances

Don’t show raw tool input/output to the user. Format it:
EventUI surface
tool_call: web_search”Searching for X…”
tool_result: web_search”Found 5 results” (collapsed; expandable)
tool_call: read_file”Reading report.md”
tool_call: bashShow command, hide output by default

HITL prompts

When hitl_request arrives, foreground a confirmation dialog. The stream is paused; the user must respond before anything else happens. After hitl_resolved, hide the dialog and resume the normal stream view.

Timeouts

Set the HTTP client’s read timeout to None (or very high). The Engine holds the stream open for the duration of the turn, which can be minutes. Normal HTTP timeouts will close the connection mid-turn.
httpx.Client(timeout=None)              # Python
fetch(url, { signal: AbortSignal.timeout(10*60*1000) })  // JS, 10 min

Gotchas

  • Buffer flushing. Some HTTP clients buffer the body. With curl, use -N (no buffering). With Python’s requests, use stream=True. With fetch, the response body’s getReader() is unbuffered by default.
  • Proxy buffering. Reverse proxies (nginx, ALB) sometimes buffer responses. Configure proxy_buffering off for the Engine path.
  • Compression. Some proxies compress responses, which buffers them. Disable compression on the SSE path or set X-Accel-Buffering: no upstream.
  • Wide writes. A single text_delta event can carry several tokens; others carry one. Don’t assume one delta = one token in your UI logic.

See also