LLM_PROVIDER and LLM_MODEL. This page
covers what’s supported, what we recommend, and where to find current
pricing.
Providers
Three providers are supported today:LLM_PROVIDER | Provider | Notes |
|---|---|---|
anthropic | Anthropic | Default for production agentic workloads. Strong tool use, long thinking, prompt caching. |
openai | OpenAI | Default for OpenAI-shaped APIs. Strong structured outputs, fast streaming. |
gemini | Google Gemini | Long context (1M+ tokens), multimodal-first, cost-efficient. |
LLM_PROVIDER=openai
and adjusting the base URL — see the engine repo for configuration.
Models we test against
The Engine is tested against the following models per Engine version:Anthropic
claude-opus-4-7— flagship, for the hardest reasoning.claude-sonnet-4-7— recommended default. Strong, fast, cost-aware.claude-sonnet-4-6/claude-sonnet-4-5— prior generations, still supported.claude-haiku-4-5-20251001— small, fast, cheap. Good for planners, light tasks, classification.
OpenAI
gpt-5.5— flagship.gpt-5.4— strong default.gpt-5.4-mini— small, cheap, fast.
Google Gemini
gemini-2.5-pro— flagship, 2M context.gemini-2.5-flash— fast, cost-efficient.
LLM_MODEL to any string the provider accepts. But we test against the
above; behavior on other models is best-effort.
Recommended setups
Production agentic deployment
Cost-optimized
Long-context
Embeddings
Regardless of which chat provider you choose, embeddings always route through OpenAI’s embedding model:OPENAI_API_KEY) even on Anthropic or
Gemini deployments. The Engine doesn’t yet support embeddings from
other providers.
Pricing
Prices change. We don’t try to keep numbers up-to-date here — they’re authoritatively at:- Anthropic — docs.claude.com/en/docs/about-claude/pricing
- OpenAI — openai.com/pricing
- Google Gemini — ai.google.dev/pricing
| Model class | Input ($/M tokens) | Output ($/M tokens) |
|---|---|---|
| Flagship (Opus, GPT-5.5, Gemini Pro) | $15 | $75 |
| Mid (Sonnet, GPT-5.4, Gemini 2.5 Flash) | $3 | $15 |
| Small (Haiku, GPT-5.4-mini) | $0.80 | $4 |
| Embeddings | $0.02 | — |
Choosing per workload
The Engine doesn’t pick a model per call automatically. Use these heuristics to pick at deployment time:| Workload | Suggested |
|---|---|
| Coding agent, long sessions, complex tasks | Sonnet 4.7 main + Haiku 4.5 planner |
| Customer support, common-questions tier | Haiku 4.5 |
| Customer support, escalations | Sonnet 4.7 |
| Research / synthesis from long sources | Gemini 2.5 Pro |
| High-throughput classification | Haiku 4.5 / GPT-5.4-mini |
| Document drafting, long-form writing | Opus 4.7 / Sonnet 4.7 |
Switching models
Models are chosen at deployment time. To switch:- Update
.envwith the newLLM_MODEL. - Restart the Engine.
- Run regression evals against the new configuration.
- Watch cost and latency for a few hundred turns.
See also
- Cost and latency — managing spend.
- Environment variables — the variables that set the model.
- Regression — checking behavior across model changes.

