What we’re building
A coding agent that:- Receives a bug description from the user.
- Explores the codebase to find the relevant files.
- Reads the failing test.
- Writes a fix.
- Runs the tests to verify.
- Reports back with the diff and the test result.
bash, grep, read_file, write_file,
and find tools.
Step 1 — Configure the agent
Add an agent definition to the catalog atcatalog/agents/coder/agent.json:
catalog/agents/coder/system-prompt.md:
Step 2 — Set up the workspace
The coder needs a workspace to operate on. Mount your repo at a path insideALLOWED_ROOTS:
/data/workspace in your
docker-compose:
Step 3 — Send the first message
Step 4 — Handle a permission prompt
Some operations require explicit approval. Suppose the agent decides the cleanest fix involves deleting an outdated file:Step 5 — Run multiple turns
Use the sametask_id to ask follow-ups. The agent has the full
context of the previous turn:
Step 6 — Add evals
Encode this session as an eval case:Improvements worth making
A real coder agent would have:Memory of past fixes
When the user asks similar questions later, retrieved knowledge facts remind the agent: “Last time something like this came up, the issue was intokens.py:42.”
This happens automatically — the Learning Centre processes the
trajectory of this session into knowledge. By the second similar bug,
the agent is faster.
A test-runner sub-agent
For codebases with slow test suites, delegate test execution to a specialized sub-agent that knows how to run only the affected tests. The main coder doesn’t need to knowpytest -k flags; it asks the
test-runner to “run tests related to the auth middleware.”
See Multi-agent patterns.
MCP integrations
Connect GitHub via MCP and the coder can:- Read PR comments.
- Open a PR with the diff.
- Check CI status.
github.create_pr and friends become available to the
agent.
Pitfalls
- The agent commits without permission. Add explicit refusal:
“Don’t run
git commitorgit pushunless the user asks.” - The agent fixes a different bug than the one described. Tighten the system prompt to emphasize “understand the user’s actual problem, not what you assume.”
- The agent loops on a flaky test. Cap turn count in the system prompt: “After 5 attempts at making the test pass, stop and ask the user.”
- The agent reformats the entire file when fixing one line. “Make the smallest change that fixes the bug” handles this.
See also
- Build a research agent — same pattern, different tools.
- Multi-agent patterns — for the test-runner sub-agent.
- Permissions — gating destructive operations.
- Eval harness — running the cases above.

