Skip to main content

Documentation Index

Fetch the complete documentation index at: https://internal.september.wtf/llms.txt

Use this file to discover all available pages before exploring further.

The Engine is provider-agnostic. You pick which LLM provider runs your agent at deployment time via LLM_PROVIDER and LLM_MODEL. This page covers what’s supported, what we recommend, and where to find current pricing.

Providers

Three providers are supported today:
LLM_PROVIDERProviderNotes
anthropicAnthropicDefault for production agentic workloads. Strong tool use, long thinking, prompt caching.
openaiOpenAIDefault for OpenAI-shaped APIs. Strong structured outputs, fast streaming.
geminiGoogle GeminiLong context (1M+ tokens), multimodal-first, cost-efficient.
You can also point at OpenAI-compatible endpoints (any provider that implements OpenAI’s chat completions API) by setting LLM_PROVIDER=openai and adjusting the base URL — see the engine repo for configuration.

Models we test against

The Engine is tested against the following models per Engine version:

Anthropic

  • claude-opus-4-7 — flagship, for the hardest reasoning.
  • claude-sonnet-4-7 — recommended default. Strong, fast, cost-aware.
  • claude-sonnet-4-6 / claude-sonnet-4-5 — prior generations, still supported.
  • claude-haiku-4-5-20251001 — small, fast, cheap. Good for planners, light tasks, classification.

OpenAI

  • gpt-5.5 — flagship.
  • gpt-5.4 — strong default.
  • gpt-5.4-mini — small, cheap, fast.

Google Gemini

  • gemini-2.5-pro — flagship, 2M context.
  • gemini-2.5-flash — fast, cost-efficient.
The Engine doesn’t enforce the model identifiers — you can set LLM_MODEL to any string the provider accepts. But we test against the above; behavior on other models is best-effort.

Production agentic deployment

LLM_PROVIDER=anthropic
LLM_MODEL=claude-sonnet-4-7
PLANNER_MODEL=claude-haiku-4-5-20251001
LIGHT_MODEL=claude-haiku-4-5-20251001
FALLBACK_MODEL=claude-sonnet-4-6
The Sonnet 4.7 default handles the main loop. The Haiku planner and light model keep cost down on planning and compaction. Sonnet 4.6 as fallback covers the case where 4.7 has a transient issue.

Cost-optimized

LLM_PROVIDER=anthropic
LLM_MODEL=claude-haiku-4-5-20251001
PLANNER_MODEL=claude-haiku-4-5-20251001
LIGHT_MODEL=claude-haiku-4-5-20251001
Haiku 4.5 across the board. Cheaper, faster, lower quality on hard tasks. Good for high-volume, throughput-bound workloads.

Long-context

LLM_PROVIDER=gemini
LLM_MODEL=gemini-2.5-pro
PLANNER_MODEL=gemini-2.5-flash
LIGHT_MODEL=gemini-2.5-flash
Gemini 2.5 Pro’s 2M-token context shines for tasks that need to load entire codebases, long PDFs, or huge transcripts. Flash for the supporting agents.

Embeddings

Regardless of which chat provider you choose, embeddings always route through OpenAI’s embedding model:
EMBEDDING_PROVIDER=openai
EMBEDDING_MODEL=text-embedding-3-small
EMBEDDING_DIM=1536
You need an OpenAI API key (OPENAI_API_KEY) even on Anthropic or Gemini deployments. The Engine doesn’t yet support embeddings from other providers.

Pricing

Prices change. We don’t try to keep numbers up-to-date here — they’re authoritatively at: Order-of-magnitude as of 2026:
Model classInput ($/M tokens)Output ($/M tokens)
Flagship (Opus, GPT-5.5, Gemini Pro)$15$75
Mid (Sonnet, GPT-5.4, Gemini 2.5 Flash)$3$15
Small (Haiku, GPT-5.4-mini)$0.80$4
Embeddings$0.02
Cache hits typically save 70–90% on the cached input portion.

Choosing per workload

The Engine doesn’t pick a model per call automatically. Use these heuristics to pick at deployment time:
WorkloadSuggested
Coding agent, long sessions, complex tasksSonnet 4.7 main + Haiku 4.5 planner
Customer support, common-questions tierHaiku 4.5
Customer support, escalationsSonnet 4.7
Research / synthesis from long sourcesGemini 2.5 Pro
High-throughput classificationHaiku 4.5 / GPT-5.4-mini
Document drafting, long-form writingOpus 4.7 / Sonnet 4.7
For workloads that span profiles (e.g., a single product with both classification and drafting), run multiple Engine instances with different model configurations and route at the application layer.

Switching models

Models are chosen at deployment time. To switch:
  1. Update .env with the new LLM_MODEL.
  2. Restart the Engine.
  3. Run regression evals against the new configuration.
  4. Watch cost and latency for a few hundred turns.
Behavior shifts subtly across model versions. Same provider, same prompt — different verbosity, different willingness to call tools. Plan to retune system prompts when you change models.

See also