Multi-agent patterns

A complex task often has more than one kind of work — planning, coding, testing, writing. You can do it all with one agent and a long system prompt, but it’s usually better to split into specialized agents that hand off to each other. This page covers the patterns that work.

Why split

A single agent has to be everything: a planner, a coder, a writer, a reviewer. The system prompt grows; the tool catalog grows; the model gets distracted between modes. Performance degrades. A multi-agent setup gives each agent a focused identity and a focused toolset. You also get cleaner traces — when something goes wrong, you can tell which agent it went wrong in.

The shapes that work

Planner + executor

The most common pattern. A planner agent breaks the task into steps; an executor agent runs each step. The Engine has built-in support for this pattern via the planner sub-agent and PLANNER_MODEL.

Specialist team

A coordinator agent dispatches to specialists. Each specialist has its own system prompt and tools. The coordinator holds the conversation with the user; specialists do the focused work.

Reviewer-of-reviewer

For quality-critical output, an agent produces work and a reviewer agent critiques. Optionally, a third agent arbitrates disagreements. Useful for legal review, content moderation, code review at production quality.

Pair-of-pairs (planner + critic, executor + tester)

Each functional role has a critic. The planner is critiqued by a plan- critic; the executor is critiqued by a test-runner. The system as a whole is more reliable than any single agent.

How the Engine implements this

The Engine ships with one main agent loop. Sub-agents run as nested calls. From the outside, you see one /execute and one stream of events; internally, the main loop’s tool calls invoke sub-agents:

event: tool_call
data: {"tool": "delegate_to_coder", "input": {"task": "implement auth"}}

[sub-agent runs internally — its events surface as nested]

event: tool_result
data: {"output": "Done. PR opened: #1234"}

The sub-agent gets its own system prompt, tools, and (optionally) its own model.

Memory across agents

In one task, all agents share memory:

The same brain.
The same conversation history.
The same working memory.

This is usually what you want — the planner sees what the coder did because the coder’s tool calls are in shared context. If you want isolation (e.g. a critic agent that should evaluate independently), use a fresh task_id for the critic and pass the work to be reviewed as the user message.

Costs and limits

Multi-agent setups multiply token use. Each sub-agent call has its own context — its own system prompt, its own tools, its own assembled history. The benefit (focused performance) usually outweighs the cost, but the math matters at scale. Ways to keep cost down:

Use cheaper models for planners and critics. Set PLANNER_MODEL and similar to a smaller model.
Cap depth. Don’t allow infinite delegation. Set a maximum depth in the parent’s system prompt.
Cache aggressively. Sub-agents with stable system prompts hit cache as well as the main agent does.

Anti-patterns

Too many specialists

If your team has 8 specialists, they spend more time coordinating than doing the work. A general-purpose agent with good tools beats a council. Stop at 2–4 specialists for any one task.

Specialists that don’t specialize

If the planner and the executor have nearly the same system prompt and nearly the same tools, you don’t have two agents — you have one agent called twice. Either differentiate or merge.

Talking among themselves

If specialists end up in a loop debating each other, you’ve built a committee. Pick a tiebreaker (the coordinator decides, the user decides, a fixed rule decides). Don’t let the agents reach consensus endlessly.

Hidden state between agents

If specialist A relies on side effects from specialist B that aren’t in shared context, your system is fragile. Make all hand-offs explicit through the shared memory or through the parent’s tool calls.

Patterns by use case

Coding agent

The cheaper models do the throughput-heavy work (reading, testing). The strong model does design and review.

Research agent

Gemini’s long context shines for searcher. Sonnet’s reasoning shines for synthesis. Haiku’s speed shines for fact-checking individual claims.

Customer support agent

Cheapest model handles the bulk; strong model handles the hard cases.

Get started

Capabilities

Build with the Engine

Agents and tools

Test and evaluate

API reference

Guides

Resources

Multi-agent patterns

Why split

The shapes that work

Planner + executor

Specialist team

Reviewer-of-reviewer

Pair-of-pairs (planner + critic, executor + tester)

How the Engine implements this

Memory across agents

Costs and limits

Anti-patterns

Too many specialists

Specialists that don’t specialize

Talking among themselves

Hidden state between agents

Patterns by use case

Coding agent

Research agent

Customer support agent

See also

Get started

Capabilities

Build with the Engine

Agents and tools

Test and evaluate

API reference

Guides

Resources

Documentation Index

​Why split

​The shapes that work

​Planner + executor

​Specialist team

​Reviewer-of-reviewer

​Pair-of-pairs (planner + critic, executor + tester)

​How the Engine implements this

​Memory across agents

​Costs and limits

​Anti-patterns

​Too many specialists

​Specialists that don’t specialize

​Talking among themselves

​Hidden state between agents

​Patterns by use case

​Coding agent

​Research agent

​Customer support agent

​See also

Why split

The shapes that work

Planner + executor

Specialist team

Reviewer-of-reviewer

Pair-of-pairs (planner + critic, executor + tester)

How the Engine implements this

Memory across agents

Costs and limits

Anti-patterns

Too many specialists

Specialists that don’t specialize

Talking among themselves

Hidden state between agents

Patterns by use case

Coding agent

Research agent

Customer support agent

See also