Documentation Index
Fetch the complete documentation index at: https://internal.september.wtf/llms.txt
Use this file to discover all available pages before exploring further.
The Engine is provider-agnostic. You pick which LLM provider runs your
agent at deployment time via LLM_PROVIDER and LLM_MODEL. This page
covers what’s supported, what we recommend, and where to find current
pricing.
Providers
Three providers are supported today:
LLM_PROVIDER | Provider | Notes |
|---|
anthropic | Anthropic | Default for production agentic workloads. Strong tool use, long thinking, prompt caching. |
openai | OpenAI | Default for OpenAI-shaped APIs. Strong structured outputs, fast streaming. |
gemini | Google Gemini | Long context (1M+ tokens), multimodal-first, cost-efficient. |
You can also point at OpenAI-compatible endpoints (any provider that
implements OpenAI’s chat completions API) by setting LLM_PROVIDER=openai
and adjusting the base URL — see the engine repo for configuration.
Models we test against
The Engine is tested against the following models per Engine version:
Anthropic
claude-opus-4-7 — flagship, for the hardest reasoning.
claude-sonnet-4-7 — recommended default. Strong, fast, cost-aware.
claude-sonnet-4-6 / claude-sonnet-4-5 — prior generations, still
supported.
claude-haiku-4-5-20251001 — small, fast, cheap. Good for planners,
light tasks, classification.
OpenAI
gpt-5.5 — flagship.
gpt-5.4 — strong default.
gpt-5.4-mini — small, cheap, fast.
Google Gemini
gemini-2.5-pro — flagship, 2M context.
gemini-2.5-flash — fast, cost-efficient.
The Engine doesn’t enforce the model identifiers — you can set
LLM_MODEL to any string the provider accepts. But we test against the
above; behavior on other models is best-effort.
Recommended setups
Production agentic deployment
LLM_PROVIDER=anthropic
LLM_MODEL=claude-sonnet-4-7
PLANNER_MODEL=claude-haiku-4-5-20251001
LIGHT_MODEL=claude-haiku-4-5-20251001
FALLBACK_MODEL=claude-sonnet-4-6
The Sonnet 4.7 default handles the main loop. The Haiku planner and
light model keep cost down on planning and compaction. Sonnet 4.6 as
fallback covers the case where 4.7 has a transient issue.
Cost-optimized
LLM_PROVIDER=anthropic
LLM_MODEL=claude-haiku-4-5-20251001
PLANNER_MODEL=claude-haiku-4-5-20251001
LIGHT_MODEL=claude-haiku-4-5-20251001
Haiku 4.5 across the board. Cheaper, faster, lower quality on hard
tasks. Good for high-volume, throughput-bound workloads.
Long-context
LLM_PROVIDER=gemini
LLM_MODEL=gemini-2.5-pro
PLANNER_MODEL=gemini-2.5-flash
LIGHT_MODEL=gemini-2.5-flash
Gemini 2.5 Pro’s 2M-token context shines for tasks that need to load
entire codebases, long PDFs, or huge transcripts. Flash for the
supporting agents.
Embeddings
Regardless of which chat provider you choose, embeddings always route
through OpenAI’s embedding model:
EMBEDDING_PROVIDER=openai
EMBEDDING_MODEL=text-embedding-3-small
EMBEDDING_DIM=1536
You need an OpenAI API key (OPENAI_API_KEY) even on Anthropic or
Gemini deployments. The Engine doesn’t yet support embeddings from
other providers.
Pricing
Prices change. We don’t try to keep numbers up-to-date here — they’re
authoritatively at:
Order-of-magnitude as of 2026:
| Model class | Input ($/M tokens) | Output ($/M tokens) |
|---|
| Flagship (Opus, GPT-5.5, Gemini Pro) | $15 | $75 |
| Mid (Sonnet, GPT-5.4, Gemini 2.5 Flash) | $3 | $15 |
| Small (Haiku, GPT-5.4-mini) | $0.80 | $4 |
| Embeddings | $0.02 | — |
Cache hits typically save 70–90% on the cached input portion.
Choosing per workload
The Engine doesn’t pick a model per call automatically. Use these
heuristics to pick at deployment time:
| Workload | Suggested |
|---|
| Coding agent, long sessions, complex tasks | Sonnet 4.7 main + Haiku 4.5 planner |
| Customer support, common-questions tier | Haiku 4.5 |
| Customer support, escalations | Sonnet 4.7 |
| Research / synthesis from long sources | Gemini 2.5 Pro |
| High-throughput classification | Haiku 4.5 / GPT-5.4-mini |
| Document drafting, long-form writing | Opus 4.7 / Sonnet 4.7 |
For workloads that span profiles (e.g., a single product with both
classification and drafting), run multiple Engine instances with
different model configurations and route at the application layer.
Switching models
Models are chosen at deployment time. To switch:
- Update
.env with the new LLM_MODEL.
- Restart the Engine.
- Run regression evals against the new configuration.
- Watch cost and latency for a few hundred turns.
Behavior shifts subtly across model versions. Same provider, same
prompt — different verbosity, different willingness to call tools. Plan
to retune system prompts when you change models.
See also