Skip to main content

Documentation Index

Fetch the complete documentation index at: https://septemberai.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Building a good agent on the Engine is mostly prompt engineering. The loop is the same; the model is whatever you’ve configured; the memory fills itself. The lever you pull most often is the system prompt — the instructions that shape how the agent thinks, what it tries first, and what it refuses. This page covers what works.

What goes in the system prompt

A production agent’s system prompt typically has six parts:
  1. Identity. Who is this agent? What’s its name, its scope, its tone?
  2. Goals. What is the agent here to do? In one paragraph.
  3. Tools. Which tools the agent has and how to think about each one. Don’t list schemas — that’s the tool catalog’s job. Instead, give heuristics: “Prefer grep over read_file when looking for a pattern across many files.”
  4. Voice and format. How does the agent talk? Long-form? Bulleted? Second person?
  5. Refusal policy. What does the agent decline to do? Phrase the line clearly so the model can hold it.
  6. Operating rules. Workflow expectations — “Always read before you write,” “Confirm before destructive operations,” “Test changes before declaring them done.”
A good system prompt is between 200 and 1500 words. Shorter loses signal; longer dilutes it.

Patterns

Lead with identity

The first paragraph sets the agent’s stance. The model takes the cue and frames every decision through it.
You are a senior infrastructure engineer reviewing pull requests for
correctness, safety, and operational implications. You read code
carefully before commenting. You don't fish for things to say — if a
PR is fine, you say so.
This works better than “You are an AI assistant that reviews pull requests.” The first version commits to a perspective; the second leaves the model to invent one.

Prescribe what to do, not what not to do

Bad:  Don't be verbose.
Good: Reply in one or two sentences unless asked for more.
Bad:  Don't make up facts.
Good: When you don't know something, say so. Use search before guessing.
The model reliably follows positive instructions. Negative instructions (“don’t X”) are weaker — sometimes the model still produces X but with a different framing.

Anchor with examples

For tasks where format matters, show, don’t tell:
When summarizing a meeting, format the output as:

  ## Decisions
  - <decision 1>
  - <decision 2>

  ## Open questions
  - <question 1>

  ## Action items
  - [ ] <person>: <action> by <date>

If a section has no entries, omit it.
A few-shot example beats a paragraph of description.

Distinguish required from optional

If a workflow has a fixed first step, say so explicitly:
Always run `git status` before any other operation. The output tells you
the state of the working tree, which informs every subsequent decision.
Without explicit ordering, the model picks based on what looks helpful, which is right most of the time but wrong at exactly the moments that matter.

Tell the model how to ask

The agent will need to ask the user questions sometimes (HITL). Tell it when:
Ask the user to confirm before:
- Any operation that deletes data.
- Any operation that affects systems outside the user's local environment.
- Any operation that costs more than $5 of inference.

Ask by emitting an hitl_request. Don't proceed until you have an answer.
Without explicit guidance, the model either over-asks (every step) or under-asks (proceeds with a destructive command on its own initiative).

Tool descriptions are part of the prompt

The model decides which tool to call by reading the tool description, not the tool implementation. A tool with a one-line description (“search files”) gets used randomly. A tool with a precise description (“Search for a regex pattern across files in the workspace. Returns matching lines with file path and line number. Use this when you need to find where a symbol is defined or used.”) gets used correctly. See Tool use for the full pattern.

Prompt versions

Treat agent prompts like code:
  • Version them. Store in the catalog with a version field. Bump on every meaningful change.
  • Test them. Run regression evals against a golden dataset every time you change a prompt.
  • Pin them. Production runs against a pinned version, not “latest.”
  • Migrate them. When a prompt change shifts behavior, document the migration the same way you’d document an API change.

Across model versions

A prompt that works perfectly on Claude Sonnet 4.5 may behave differently on 4.6. The structure is usually portable; the calibration isn’t:
  • What carries over: identity framing, examples, refusal policy.
  • What shifts: verbosity, willingness to call tools without asking, default response length.
When you upgrade models, expect to retune length and call-frequency hints. Don’t expect to rewrite from scratch. Across providers (Anthropic → OpenAI → Gemini) the differences are larger. Each provider has its own preferred structure for system prompts. If you support multiple providers, maintain a prompt per provider rather than a single universal one.

What not to put in the system prompt

  • Per-call data. Things specific to one request belong in the user message, not the system prompt. Caching depends on system prompts being stable.
  • The user’s name and details. Those live in the soul. The system prompt is for “what the agent is,” not “who it’s serving.”
  • Long static content. Reference docs, large code samples, FAQ entries. Put them in tool output (retrieval) instead, or in the soul if they’re persistent. The system prompt is for instructions, not knowledge.
  • Self-undermining hedges. “You are a helpful assistant. However, you are not always right and the user should verify.” This kind of hedge teaches the model to back off. State the boundaries you want; don’t apologize.

Patterns that fail

Wishful thinking

You are an AI agent that always produces the correct answer.
The model can’t promise this and saying it doesn’t help. Instead:
When you're confident, answer directly. When you're not, say what
you're uncertain about and what would resolve it.

Conflicting instructions

Be concise. Provide thorough explanations.
Pick one. The model will resolve the conflict by averaging, which gives you medium-length explanations that are neither concise nor thorough.

Instructions that contradict the tools

Never modify files.
[but the agent has write_file in its tool list]
The model trusts its tool list more than its prompt. Either remove the tool or remove the instruction.

See also

  • Context strategy — how the prompt fits into the larger context the model sees.
  • Tool use — prompt-engineering for tool descriptions specifically.
  • Evaluation — how to know whether your prompt change helped or hurt.