When the agent needs to run code — execute a script, test a hypothesis, inspect a file, do math the model can’t reliably do in its head — it calls theDocumentation Index
Fetch the complete documentation index at: https://septemberai.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
bash platform tool, which runs the command inside a hardened
sandbox. This page covers what runs where, what the sandbox guarantees,
and how to use code execution safely from the application side.
What “code execution” means in the Engine
Three different things often get called “code execution”:bashtool. A sandboxed shell. The agent runs commands; gets stdout, stderr, and exit code back. Most “code execution” in agentic systems is this.- Skill scripts. Skills can include short scripts (Python, shell) that run in the sandbox when the skill is invoked. These are pre-defined; the agent invokes by name.
- Provider code interpreter. Some providers offer a hosted
sandboxed code interpreter as a tool. The Engine doesn’t currently
expose this — use the
bashtool instead, which runs locally with stronger isolation.
What the sandbox guarantees
Every command run viabash executes inside three nested isolation
layers:
bubblewrap— namespaces. Separate filesystem (only allowed paths visible), separate PID space (can’t see other processes), separate network namespace (controlled), separate IPC.seccomp— syscall filter. Dangerous syscalls (mount,pivot_root, most module operations) are blocked outright.landlock— kernel-level filesystem ACLs. Even if a process bypasses bwrap somehow, landlock denies the write.
rm -rf outside known paths, shred, chown, etc.) and surfaces
them as HITL prompts. The user — not the model — decides whether to
allow them.
Allowed roots
The sandbox sees only paths declared inALLOWED_ROOTS (comma-separated
absolute paths). Everything else is invisible to the running process.
For a typical deployment, ALLOWED_ROOTS covers:
- The brain database directory (so memory tools work).
- A workspace directory where the agent stages files.
- Any explicit paths the user has granted access to.
What’s available inside
The sandbox image is a minimal Linux filesystem with a small set of tools pre-installed:bash,sh,coreutils(ls,cat,cp,mv,rm, etc.)grep,sed,awkgitcurl,wget(subject to network policy)python3with stdlib (no third-party packages by default)
A typical code-execution turn
If the command had beenrm -rf /data/work/old, step 3 would emit a
permission prompt (hitl_request) and pause until the user answered.
Working with files across calls
Files written inside the sandbox during a turn persist for the duration of the workspace. Subsequent tool calls in the same task see them. After the task ends, the sandbox is reset. Patterns:- Stage in workspace. Write intermediate files to the sandbox
workspace; read them back with
read_filelater in the turn. - Persist via
write_file. For results that should survive past the turn, write to a path insideALLOWED_ROOTSthat’s mounted from durable storage. The brain volume is the obvious choice. - Don’t rely on
/tmpacross turns. It may be cleared between turns. Use the workspace.
Network access
By default, the sandbox has restricted network access.curl and
wget work for outbound HTTPS; inbound is blocked. For deployments
that need stricter network policy, add iptables rules via
scripts/firewall.sh (which the engine repo ships).
For network policy ideas:
- Allowlist. Only specific outbound domains. Useful when the agent should only touch the user’s known infrastructure.
- Block all. Disable network entirely for purely-local computation agents.
- Default open. What ships in development. Suitable when the user is the trust boundary.
Subprocess environment
When the sandbox runs a command, environment variables are scrubbed. The level is controlled bySUBPROCESS_SCRUB_MODE:
| Mode | Behavior |
|---|---|
off | Pass the Engine’s environment through. Don’t use in production. |
default | Strip credentials and known-sensitive vars. Reasonable for dev. |
strict | Pass only an allowlist of safe vars. Recommended for production. |
strict mode, the agent’s bash sees only PATH, HOME, USER,
LANG, and a handful of others. It does NOT see your LLM API key, your
Engine API key, or anything else the Engine reads from its own
environment.
Patterns
Inspect-then-act
The agent reads files, runsgit status, checks logs, then proposes a
change. Most useful turns follow this shape. Don’t try to limit the
agent to “act only” — it can’t act intelligently without reading first.
Test-driven changes
The agent writes a test, runs it, sees it fail, writes the implementation, runs the test again, sees it pass. This works beautifully when the codebase has fast tests and the agent can run them.Long-running commands
Commands that take more than a few seconds (builds, large greps) are fine. The sandbox enforces a per-command timeout (configurable). Streaming stdout from a long command lets the agent decide whether to keep waiting or give up.Background processes
The sandbox includes a background watchdog that kills orphaned processes when the parent task ends. Don’t rely on a daemon spawned by one tool call to be alive in the next.Pitfalls
rm -rfon a glob that resolves wider than expected. The permission system catches the obvious cases, but write defensively in scripts the agent runs. Use--dry-runfor destructive operations whenever the tool supports it.- Heredocs that include shell variables.
cat <<EOFwill expand variables; usecat <<'EOF'to disable expansion. - Tools that spawn editors.
git commitwithout-m,crontab -e, etc. open an editor; the sandbox doesn’t have one connected. Always pass non-interactive flags. - Trusting
exit_code: 0to mean “did the right thing.” Some commands return 0 even on partial failure. Read the output too.
See also
- Sandbox in components — the implementation.
- Permissions — what gets gated.
- Environment variables
—
ALLOWED_ROOTS,SUBPROCESS_SCRUB_MODE, sandbox feature flags.

