Documentation Index
Fetch the complete documentation index at: https://internal.september.wtf/llms.txt
Use this file to discover all available pages before exploring further.
This guide builds a complete coding agent on the Engine, from zero to a
working session that fixes a real bug. By the end, you’ll understand
the moving pieces and have copy-pasteable code for each.
What we’re building
A coding agent that:
- Receives a bug description from the user.
- Explores the codebase to find the relevant files.
- Reads the failing test.
- Writes a fix.
- Runs the tests to verify.
- Reports back with the diff and the test result.
The agent uses the platform bash, grep, read_file, write_file,
and find tools.
Add an agent definition to the catalog at
catalog/agents/coder/agent.json:
{
"name": "coder",
"description": "A senior software engineer who fixes bugs and implements small features in a codebase.",
"model": "claude-sonnet-4-7",
"tools": ["bash", "grep", "find", "read_file", "write_file"],
"system_prompt_path": "system-prompt.md"
}
And the system prompt at
catalog/agents/coder/system-prompt.md:
You are a senior software engineer fixing bugs and implementing small
features in this codebase. You work the way a careful engineer works.
## How to operate
1. Start by understanding the problem. Read the bug report carefully.
If something is ambiguous, ask the user before guessing.
2. Explore before acting. Use `find` and `grep` to locate the relevant
files. Read the actual code, not what you assume the code does.
3. If there's a failing test, run it first to see how it fails. The
error message is your starting point.
4. Make the smallest change that fixes the bug. Don't refactor adjacent
code unless the user asked.
5. Run the tests after every change. If they pass, you're done. If they
don't, read the new failure and iterate.
6. When done, summarize what changed in 2-3 sentences. Cite file paths
and line numbers.
## Boundaries
- You may modify files inside the workspace.
- You may not modify the user's home directory or system files.
- You may not commit or push to git unless the user explicitly asks.
- You may not delete files without confirming via hitl_request.
## Voice
Be concise. Use plain language. Skip preambles ("Sure, I can help...").
Don't apologize unless something actually went wrong.
Reload the catalog:
curl -X POST "$ENGINE_URL/admin/reload-catalog" -H "X-Engine-Key: $KEY"
Step 2 — Set up the workspace
The coder needs a workspace to operate on. Mount your repo at a path
inside ALLOWED_ROOTS:
ALLOWED_ROOTS=/data/brain,/data/workspace
Then bind-mount the project repo at /data/workspace in your
docker-compose:
services:
engine:
volumes:
- engine_data:/data/brain
- ./my-project:/data/workspace
environment:
- ALLOWED_ROOTS=/data/brain,/data/workspace
Restart the Engine.
Step 3 — Send the first message
export ENGINE_URL=http://localhost:8000
export ENGINE_KEY=dev-engine-key
curl -N -X POST "$ENGINE_URL/execute" \
-H "Content-Type: application/json" \
-H "X-Engine-Key: $ENGINE_KEY" \
-d '{
"message": "There is a bug in the authentication middleware: tokens that expire exactly at the current second are accepted, when they should be rejected. Find the issue, fix it, and run the tests.",
"task_id": "fix-auth-timing-001"
}'
The stream comes back. You’ll see something like:
event: thread_lifecycle
data: {"phase":"started", ...}
event: text_delta
data: {"text":"I'll look at the authentication middleware."}
event: tool_call
data: {"tool":"find","input":{"path":".","name":"*auth*"}}
event: tool_result
data: {"output":"./src/auth/middleware.py\n./src/auth/tokens.py\n./tests/test_auth.py\n"}
event: tool_call
data: {"tool":"grep","input":{"pattern":"expir","path":"src/auth/"}}
event: tool_result
data: {"output":"src/auth/tokens.py:42: if expiry < now:\n"}
event: tool_call
data: {"tool":"read_file","input":{"path":"src/auth/tokens.py"}}
event: tool_result
data: {"output":"...full file..."}
event: text_delta
data: {"text":"Found it. Line 42 uses `<` instead of `<=`. Tokens that
expire at exactly the current second pass the check."}
event: tool_call
data: {"tool":"bash","input":{"command":"cd /data/workspace && pytest tests/test_auth.py -k expir -x"}}
event: tool_result
data: {"output":"FAILED tests/test_auth.py::test_token_expires_at_exact_second\n"}
event: tool_call
data: {"tool":"write_file","input":{"path":"src/auth/tokens.py","content":"...with `<=` ..."}}
event: tool_result
data: {"output":"wrote 1.2 KB"}
event: tool_call
data: {"tool":"bash","input":{"command":"cd /data/workspace && pytest tests/test_auth.py -k expir -x"}}
event: tool_result
data: {"output":"PASSED tests/test_auth.py::test_token_expires_at_exact_second\n"}
event: text_delta
data: {"text":"Fixed. Changed line 42 of `src/auth/tokens.py` from
`expiry < now` to `expiry <= now`. The auth tests now pass."}
event: thread_lifecycle
data: {"phase":"completed", ...}
That’s a complete session. The agent found the bug, fixed it, verified
the fix.
Step 4 — Handle a permission prompt
Some operations require explicit approval. Suppose the agent decides
the cleanest fix involves deleting an outdated file:
event: hitl_request
data: {
"request_id": "hitl-...",
"kind": "permission",
"question": "Permit this operation?",
"context": {
"operation": "bash",
"command": "rm src/auth/legacy_validator.py",
"rationale": "Removes a legacy validator that was made obsolete by the fix."
},
"options": ["yes", "no"]
}
Your client surfaces this to the user. The user answers:
curl -X POST "$ENGINE_URL/hitl/respond" \
-H "Content-Type: application/json" \
-H "X-Engine-Key: $ENGINE_KEY" \
-d '{
"task_id": "fix-auth-timing-001",
"answer": "yes"
}'
The stream resumes. The agent runs the deletion and continues.
Step 5 — Run multiple turns
Use the same task_id to ask follow-ups. The agent has the full
context of the previous turn:
curl -N -X POST "$ENGINE_URL/execute" \
-H "Content-Type: application/json" \
-H "X-Engine-Key: $ENGINE_KEY" \
-d '{
"message": "Add a regression test for this bug.",
"task_id": "fix-auth-timing-001"
}'
The agent recalls the bug, picks an appropriate test file, writes a
test, and runs it.
Step 6 — Add evals
Encode this session as an eval case:
{
"id": "coder-eval-001",
"category": "coding-agent",
"description": "Agent should fix off-by-one in token expiry check.",
"input": {
"message": "There is a bug in the authentication middleware: tokens that expire exactly at the current second are accepted, when they should be rejected. Find the issue, fix it, and run the tests."
},
"expected": {
"must_call_tools": ["find", "grep", "read_file", "write_file", "bash"],
"output_must_contain": ["<=", "tests now pass"],
"max_turns": 10,
"max_tokens": 8000
},
"tags": ["off-by-one", "regression-template"]
}
Run regression evals against this case every time you change the
coder’s prompt. See Regression.
Improvements worth making
A real coder agent would have:
Memory of past fixes
When the user asks similar questions later, retrieved knowledge facts
remind the agent: “Last time something like this came up, the issue was
in tokens.py:42.”
This happens automatically — the Learning Centre processes the
trajectory of this session into knowledge. By the second similar bug,
the agent is faster.
A test-runner sub-agent
For codebases with slow test suites, delegate test execution to a
specialized sub-agent that knows how to run only the affected tests.
The main coder doesn’t need to know pytest -k flags; it asks the
test-runner to “run tests related to the auth middleware.”
See Multi-agent patterns.
MCP integrations
Connect GitHub via MCP and the coder can:
- Read PR comments.
- Open a PR with the diff.
- Check CI status.
curl -X POST "$ENGINE_URL/assets/connect" \
-H "X-Engine-Key: $ENGINE_KEY" \
-d '{"server_name": "github"}'
After OAuth, github.create_pr and friends become available to the
agent.
Pitfalls
- The agent commits without permission. Add explicit refusal:
“Don’t run
git commit or git push unless the user asks.”
- The agent fixes a different bug than the one described. Tighten
the system prompt to emphasize “understand the user’s actual problem,
not what you assume.”
- The agent loops on a flaky test. Cap turn count in the system
prompt: “After 5 attempts at making the test pass, stop and ask the
user.”
- The agent reformats the entire file when fixing one line. “Make
the smallest change that fixes the bug” handles this.
See also