Skip to main content

Documentation Index

Fetch the complete documentation index at: https://internal.september.wtf/llms.txt

Use this file to discover all available pages before exploring further.

This guide builds a complete coding agent on the Engine, from zero to a working session that fixes a real bug. By the end, you’ll understand the moving pieces and have copy-pasteable code for each.

What we’re building

A coding agent that:
  • Receives a bug description from the user.
  • Explores the codebase to find the relevant files.
  • Reads the failing test.
  • Writes a fix.
  • Runs the tests to verify.
  • Reports back with the diff and the test result.
The agent uses the platform bash, grep, read_file, write_file, and find tools.

Step 1 — Configure the agent

Add an agent definition to the catalog at catalog/agents/coder/agent.json:
{
  "name": "coder",
  "description": "A senior software engineer who fixes bugs and implements small features in a codebase.",
  "model": "claude-sonnet-4-7",
  "tools": ["bash", "grep", "find", "read_file", "write_file"],
  "system_prompt_path": "system-prompt.md"
}
And the system prompt at catalog/agents/coder/system-prompt.md:
You are a senior software engineer fixing bugs and implementing small
features in this codebase. You work the way a careful engineer works.

## How to operate

1. Start by understanding the problem. Read the bug report carefully.
   If something is ambiguous, ask the user before guessing.
2. Explore before acting. Use `find` and `grep` to locate the relevant
   files. Read the actual code, not what you assume the code does.
3. If there's a failing test, run it first to see how it fails. The
   error message is your starting point.
4. Make the smallest change that fixes the bug. Don't refactor adjacent
   code unless the user asked.
5. Run the tests after every change. If they pass, you're done. If they
   don't, read the new failure and iterate.
6. When done, summarize what changed in 2-3 sentences. Cite file paths
   and line numbers.

## Boundaries

- You may modify files inside the workspace.
- You may not modify the user's home directory or system files.
- You may not commit or push to git unless the user explicitly asks.
- You may not delete files without confirming via hitl_request.

## Voice

Be concise. Use plain language. Skip preambles ("Sure, I can help...").
Don't apologize unless something actually went wrong.
Reload the catalog:
curl -X POST "$ENGINE_URL/admin/reload-catalog" -H "X-Engine-Key: $KEY"

Step 2 — Set up the workspace

The coder needs a workspace to operate on. Mount your repo at a path inside ALLOWED_ROOTS:
ALLOWED_ROOTS=/data/brain,/data/workspace
Then bind-mount the project repo at /data/workspace in your docker-compose:
services:
  engine:
    volumes:
      - engine_data:/data/brain
      - ./my-project:/data/workspace
    environment:
      - ALLOWED_ROOTS=/data/brain,/data/workspace
Restart the Engine.

Step 3 — Send the first message

export ENGINE_URL=http://localhost:8000
export ENGINE_KEY=dev-engine-key

curl -N -X POST "$ENGINE_URL/execute" \
  -H "Content-Type: application/json" \
  -H "X-Engine-Key: $ENGINE_KEY" \
  -d '{
    "message": "There is a bug in the authentication middleware: tokens that expire exactly at the current second are accepted, when they should be rejected. Find the issue, fix it, and run the tests.",
    "task_id": "fix-auth-timing-001"
  }'
The stream comes back. You’ll see something like:
event: thread_lifecycle
data: {"phase":"started", ...}

event: text_delta
data: {"text":"I'll look at the authentication middleware."}

event: tool_call
data: {"tool":"find","input":{"path":".","name":"*auth*"}}

event: tool_result
data: {"output":"./src/auth/middleware.py\n./src/auth/tokens.py\n./tests/test_auth.py\n"}

event: tool_call
data: {"tool":"grep","input":{"pattern":"expir","path":"src/auth/"}}

event: tool_result
data: {"output":"src/auth/tokens.py:42:    if expiry < now:\n"}

event: tool_call
data: {"tool":"read_file","input":{"path":"src/auth/tokens.py"}}

event: tool_result
data: {"output":"...full file..."}

event: text_delta
data: {"text":"Found it. Line 42 uses `<` instead of `<=`. Tokens that
        expire at exactly the current second pass the check."}

event: tool_call
data: {"tool":"bash","input":{"command":"cd /data/workspace && pytest tests/test_auth.py -k expir -x"}}

event: tool_result
data: {"output":"FAILED tests/test_auth.py::test_token_expires_at_exact_second\n"}

event: tool_call
data: {"tool":"write_file","input":{"path":"src/auth/tokens.py","content":"...with `<=` ..."}}

event: tool_result
data: {"output":"wrote 1.2 KB"}

event: tool_call
data: {"tool":"bash","input":{"command":"cd /data/workspace && pytest tests/test_auth.py -k expir -x"}}

event: tool_result
data: {"output":"PASSED tests/test_auth.py::test_token_expires_at_exact_second\n"}

event: text_delta
data: {"text":"Fixed. Changed line 42 of `src/auth/tokens.py` from
        `expiry < now` to `expiry <= now`. The auth tests now pass."}

event: thread_lifecycle
data: {"phase":"completed", ...}
That’s a complete session. The agent found the bug, fixed it, verified the fix.

Step 4 — Handle a permission prompt

Some operations require explicit approval. Suppose the agent decides the cleanest fix involves deleting an outdated file:
event: hitl_request
data: {
  "request_id": "hitl-...",
  "kind": "permission",
  "question": "Permit this operation?",
  "context": {
    "operation": "bash",
    "command": "rm src/auth/legacy_validator.py",
    "rationale": "Removes a legacy validator that was made obsolete by the fix."
  },
  "options": ["yes", "no"]
}
Your client surfaces this to the user. The user answers:
curl -X POST "$ENGINE_URL/hitl/respond" \
  -H "Content-Type: application/json" \
  -H "X-Engine-Key: $ENGINE_KEY" \
  -d '{
    "task_id": "fix-auth-timing-001",
    "answer": "yes"
  }'
The stream resumes. The agent runs the deletion and continues.

Step 5 — Run multiple turns

Use the same task_id to ask follow-ups. The agent has the full context of the previous turn:
curl -N -X POST "$ENGINE_URL/execute" \
  -H "Content-Type: application/json" \
  -H "X-Engine-Key: $ENGINE_KEY" \
  -d '{
    "message": "Add a regression test for this bug.",
    "task_id": "fix-auth-timing-001"
  }'
The agent recalls the bug, picks an appropriate test file, writes a test, and runs it.

Step 6 — Add evals

Encode this session as an eval case:
{
  "id": "coder-eval-001",
  "category": "coding-agent",
  "description": "Agent should fix off-by-one in token expiry check.",
  "input": {
    "message": "There is a bug in the authentication middleware: tokens that expire exactly at the current second are accepted, when they should be rejected. Find the issue, fix it, and run the tests."
  },
  "expected": {
    "must_call_tools": ["find", "grep", "read_file", "write_file", "bash"],
    "output_must_contain": ["<=", "tests now pass"],
    "max_turns": 10,
    "max_tokens": 8000
  },
  "tags": ["off-by-one", "regression-template"]
}
Run regression evals against this case every time you change the coder’s prompt. See Regression.

Improvements worth making

A real coder agent would have:

Memory of past fixes

When the user asks similar questions later, retrieved knowledge facts remind the agent: “Last time something like this came up, the issue was in tokens.py:42.” This happens automatically — the Learning Centre processes the trajectory of this session into knowledge. By the second similar bug, the agent is faster.

A test-runner sub-agent

For codebases with slow test suites, delegate test execution to a specialized sub-agent that knows how to run only the affected tests. The main coder doesn’t need to know pytest -k flags; it asks the test-runner to “run tests related to the auth middleware.” See Multi-agent patterns.

MCP integrations

Connect GitHub via MCP and the coder can:
  • Read PR comments.
  • Open a PR with the diff.
  • Check CI status.
curl -X POST "$ENGINE_URL/assets/connect" \
  -H "X-Engine-Key: $ENGINE_KEY" \
  -d '{"server_name": "github"}'
After OAuth, github.create_pr and friends become available to the agent.

Pitfalls

  • The agent commits without permission. Add explicit refusal: “Don’t run git commit or git push unless the user asks.”
  • The agent fixes a different bug than the one described. Tighten the system prompt to emphasize “understand the user’s actual problem, not what you assume.”
  • The agent loops on a flaky test. Cap turn count in the system prompt: “After 5 attempts at making the test pass, stop and ask the user.”
  • The agent reformats the entire file when fixing one line. “Make the smallest change that fixes the bug” handles this.

See also