Models and pricing

The Engine is provider-agnostic. You pick which LLM provider runs your agent at deployment time via LLM_PROVIDER and LLM_MODEL. This page covers what’s supported, what we recommend, and where to find current pricing.

Providers

Three providers are supported today:

`LLM_PROVIDER`	Provider	Notes
`anthropic`	Anthropic	Default for production agentic workloads. Strong tool use, long thinking, prompt caching.
`openai`	OpenAI	Default for OpenAI-shaped APIs. Strong structured outputs, fast streaming.
`gemini`	Google Gemini	Long context (1M+ tokens), multimodal-first, cost-efficient.

You can also point at OpenAI-compatible endpoints (any provider that implements OpenAI’s chat completions API) by setting LLM_PROVIDER=openai and adjusting the base URL — see the engine repo for configuration.

Models we test against

The Engine is tested against the following models per Engine version:

Anthropic

claude-opus-4-7 — flagship, for the hardest reasoning.
claude-sonnet-4-7 — recommended default. Strong, fast, cost-aware.
claude-sonnet-4-6 / claude-sonnet-4-5 — prior generations, still supported.
claude-haiku-4-5-20251001 — small, fast, cheap. Good for planners, light tasks, classification.

OpenAI

gpt-5.5 — flagship.
gpt-5.4 — strong default.
gpt-5.4-mini — small, cheap, fast.

Google Gemini

gemini-2.5-pro — flagship, 2M context.
gemini-2.5-flash — fast, cost-efficient.

The Engine doesn’t enforce the model identifiers — you can set LLM_MODEL to any string the provider accepts. But we test against the above; behavior on other models is best-effort.

Recommended setups

Production agentic deployment

LLM_PROVIDER=anthropic
LLM_MODEL=claude-sonnet-4-7
PLANNER_MODEL=claude-haiku-4-5-20251001
LIGHT_MODEL=claude-haiku-4-5-20251001
FALLBACK_MODEL=claude-sonnet-4-6

The Sonnet 4.7 default handles the main loop. The Haiku planner and light model keep cost down on planning and compaction. Sonnet 4.6 as fallback covers the case where 4.7 has a transient issue.

Cost-optimized

LLM_PROVIDER=anthropic
LLM_MODEL=claude-haiku-4-5-20251001
PLANNER_MODEL=claude-haiku-4-5-20251001
LIGHT_MODEL=claude-haiku-4-5-20251001

Haiku 4.5 across the board. Cheaper, faster, lower quality on hard tasks. Good for high-volume, throughput-bound workloads.

Long-context

LLM_PROVIDER=gemini
LLM_MODEL=gemini-2.5-pro
PLANNER_MODEL=gemini-2.5-flash
LIGHT_MODEL=gemini-2.5-flash

Gemini 2.5 Pro’s 2M-token context shines for tasks that need to load entire codebases, long PDFs, or huge transcripts. Flash for the supporting agents.

Embeddings

Regardless of which chat provider you choose, embeddings always route through OpenAI’s embedding model:

EMBEDDING_PROVIDER=openai
EMBEDDING_MODEL=text-embedding-3-small
EMBEDDING_DIM=1536

You need an OpenAI API key (OPENAI_API_KEY) even on Anthropic or Gemini deployments. The Engine doesn’t yet support embeddings from other providers.

Pricing

Prices change. We don’t try to keep numbers up-to-date here — they’re authoritatively at:

Anthropic — docs.claude.com/en/docs/about-claude/pricing
OpenAI — openai.com/pricing
Google Gemini — ai.google.dev/pricing

Order-of-magnitude as of 2026:

Model class	Input ($/M tokens)	Output ($/M tokens)
Flagship (Opus, GPT-5.5, Gemini Pro)	$15	$75
Mid (Sonnet, GPT-5.4, Gemini 2.5 Flash)	$3	$15
Small (Haiku, GPT-5.4-mini)	$0.80	$4
Embeddings	$0.02	—

Cache hits typically save 70–90% on the cached input portion.

Choosing per workload

The Engine doesn’t pick a model per call automatically. Use these heuristics to pick at deployment time:

Workload	Suggested
Coding agent, long sessions, complex tasks	Sonnet 4.7 main + Haiku 4.5 planner
Customer support, common-questions tier	Haiku 4.5
Customer support, escalations	Sonnet 4.7
Research / synthesis from long sources	Gemini 2.5 Pro
High-throughput classification	Haiku 4.5 / GPT-5.4-mini
Document drafting, long-form writing	Opus 4.7 / Sonnet 4.7

For workloads that span profiles (e.g., a single product with both classification and drafting), run multiple Engine instances with different model configurations and route at the application layer.

Switching models

Models are chosen at deployment time. To switch:

Update .env with the new LLM_MODEL.
Restart the Engine.
Run regression evals against the new configuration.
Watch cost and latency for a few hundred turns.

Behavior shifts subtly across model versions. Same provider, same prompt — different verbosity, different willingness to call tools. Plan to retune system prompts when you change models.

Get started

Capabilities

Build with the Engine

Agents and tools

Test and evaluate

API reference

Guides

Resources

Models and pricing

Providers

Models we test against

Anthropic

OpenAI

Google Gemini

Recommended setups

Production agentic deployment

Cost-optimized

Long-context

Embeddings

Pricing

Choosing per workload

Switching models

See also

Get started

Capabilities

Build with the Engine

Agents and tools

Test and evaluate

API reference

Guides

Resources

Documentation Index

​Providers

​Models we test against

​Anthropic

​OpenAI

​Google Gemini

​Recommended setups

​Production agentic deployment

​Cost-optimized

​Long-context

​Embeddings

​Pricing

​Choosing per workload

​Switching models

​See also

Providers

Models we test against

Anthropic

OpenAI

Google Gemini

Recommended setups

Production agentic deployment

Cost-optimized

Long-context

Embeddings

Pricing

Choosing per workload

Switching models

See also