Skip to main content

Documentation Index

Fetch the complete documentation index at: https://internal.september.wtf/llms.txt

Use this file to discover all available pages before exploring further.

This page collects questions we hear repeatedly. The list is short on purpose — every recurring question is a sign the docs missed something, and we prefer to fix the docs. If your question isn’t here, the answer probably lives in the section linked at the bottom of this page.

Is the Engine multi-tenant?

No. Each Engine instance is single-tenant — one brain, one user. To serve many users, run many Engines and route at the upstream layer. This is a deliberate design choice. See Architecture overview for the reasoning.

Does the Engine host LLM models?

No. The Engine talks to external LLM providers (Anthropic, OpenAI, Google Gemini) over their public APIs. You provide the credentials.

Can I run the Engine without an internet connection?

You can run the Engine itself locally. But the Engine needs to reach the LLM provider, so the model calls require outbound HTTPS. For an air-gapped deployment, you’d need a self-hosted LLM endpoint that implements one of the supported APIs.

Can I use a different LLM provider than the three you list?

If the provider implements OpenAI’s chat completions API, set LLM_PROVIDER=openai and configure the base URL. Otherwise, you’d need to add a provider adapter to src/herald/providers/.

Why SQLite and not Postgres?

SQLite is single-file, single-process, and embedded — it matches the single-tenant Engine model exactly. No separate DB to provision, back up, or manage. For multi-tenant workloads, you run multiple Engines each with their own SQLite, not one Engine against a multi-tenant Postgres. The Postgres adapter that briefly existed (src/infra/pg.py) was removed in v2.3.0.

How do I update the agent’s prompt without restarting?

Edit the agent definition in the catalog, then call:
curl -X POST "$ENGINE_URL/admin/reload-catalog" \
  -H "X-Engine-Key: $KEY"
The new catalog is live for the next request.

Does the Engine remember things across sessions?

Yes. Memory persists in the brain database. Two pieces:
  • Same task_id — the conversation history is on disk; reuse the ID to continue the thread.
  • Different task_id — long-term memory (episodes, knowledge, soul, social graph) is shared across all tasks for this user. The agent retrieves relevant past memory at the start of each turn.

How do I make the agent forget something?

For one episode or knowledge fact: delete the row directly via SQL (there’s no API for this today). For a whole task: use a new task_id. For everything: wipe the brain volume and let the Learning Centre re-fill it as new conversations happen. For one user’s data in a multi-tenant deployment: blow away that user’s Engine instance.

How long does context persist?

  • Conversation history — indefinitely; lives in conversations.
  • Working memory — TTL-bound (configurable; defaults to a few hours).
  • Long-term memory — indefinitely, with confidence-based aging on knowledge facts.
  • Channel state — TTL-bound: 3 hours active, 72 hours HITL.

Can the agent run code that takes minutes?

Yes. Code execution inside the sandbox can run for a long time. The SSE stream stays open. There’s a per-command timeout (configurable); beyond that, the command is killed. For really long-running work (hours), structure as multiple turns — the agent kicks off the work, you check status periodically.

What happens if my client disconnects mid-stream?

The Engine keeps running. Reconnect with GET /execute/replay?after=<timestamp> to catch up on missed events. Channel state survives for CHANNEL_STATE_TTL_ACTIVE (default 3 h). See Durability.

How do I make the agent NOT do something?

Three layers, from highest to lowest authority:
  1. Permission policy. The Engine intercepts dangerous operations and prompts the user. Configure with ALLOWED_ROOTS, sandbox feature flags, or requires_permission on tool definitions.
  2. System prompt. Add an explicit refusal: “You will not help with X.”
  3. Tool removal. If the agent doesn’t have a tool, it can’t use it. Curate the catalog.
The model’s trained-in refusals are real but not reliable. Don’t depend on them alone for anything important.

How much does it cost to run the Engine?

The Engine itself is free. The cost is the LLM provider’s per-token charge plus your infrastructure (compute, storage, MCP server APIs). For order-of-magnitude estimates by task type, see Cost and latency.

Can the agent send emails?

Yes — connect a Gmail MCP server (or any other email provider with an MCP integration). After OAuth, the email actions register in the catalog and the agent can use them.

Does the Engine support voice?

Not natively today. To build voice, transcribe upstream (Whisper or similar) and pass the text to /execute. Synthesize the response upstream too. For Gemini deployments, the model can directly handle audio in the media field — see Multimodal.

Where do I report a bug?

For Engine code, the engine repo issues. For docs, the engine-docs issues. For security, see SECURITY.md.

Where do I get help?

For now, GitHub issues and email (hello@september.wtf). A community forum is in the works.

See also

If your question isn’t here, the answer is probably one of: