Code execution

When the agent needs to run code — execute a script, test a hypothesis, inspect a file, do math the model can’t reliably do in its head — it calls the bash platform tool, which runs the command inside a hardened sandbox. This page covers what runs where, what the sandbox guarantees, and how to use code execution safely from the application side.

What “code execution” means in the Engine

Three different things often get called “code execution”:

bash tool. A sandboxed shell. The agent runs commands; gets stdout, stderr, and exit code back. Most “code execution” in agentic systems is this.
Skill scripts. Skills can include short scripts (Python, shell) that run in the sandbox when the skill is invoked. These are pre-defined; the agent invokes by name.
Provider code interpreter. Some providers offer a hosted sandboxed code interpreter as a tool. The Engine doesn’t currently expose this — use the bash tool instead, which runs locally with stronger isolation.

When this page says “code execution,” it means #1 unless noted.

What the sandbox guarantees

Every command run via bash executes inside three nested isolation layers:

bubblewrap — namespaces. Separate filesystem (only allowed paths visible), separate PID space (can’t see other processes), separate network namespace (controlled), separate IPC.
seccomp — syscall filter. Dangerous syscalls (mount, pivot_root, most module operations) are blocked outright.
landlock — kernel-level filesystem ACLs. Even if a process bypasses bwrap somehow, landlock denies the write.

On top of those, the permission system intercepts dangerous commands (rm -rf outside known paths, shred, chown, etc.) and surfaces them as HITL prompts. The user — not the model — decides whether to allow them.

Allowed roots

The sandbox sees only paths declared in ALLOWED_ROOTS (comma-separated absolute paths). Everything else is invisible to the running process. For a typical deployment, ALLOWED_ROOTS covers:

The brain database directory (so memory tools work).
A workspace directory where the agent stages files.
Any explicit paths the user has granted access to.

Anything outside is not just write-protected — it’s not in the sandbox’s filesystem view at all.

What’s available inside

The sandbox image is a minimal Linux filesystem with a small set of tools pre-installed:

bash, sh, coreutils (ls, cat, cp, mv, rm, etc.)
grep, sed, awk
git
curl, wget (subject to network policy)
python3 with stdlib (no third-party packages by default)

Custom tools or libraries can be installed by extending the sandbox image. Per-deployment customization lives in the engine repo’s Dockerfile.

A typical code-execution turn

If the command had been rm -rf /data/work/old, step 3 would emit a permission prompt (hitl_request) and pause until the user answered.

Working with files across calls

Files written inside the sandbox during a turn persist for the duration of the workspace. Subsequent tool calls in the same task see them. After the task ends, the sandbox is reset. Patterns:

Stage in workspace. Write intermediate files to the sandbox workspace; read them back with read_file later in the turn.
Persist via write_file. For results that should survive past the turn, write to a path inside ALLOWED_ROOTS that’s mounted from durable storage. The brain volume is the obvious choice.
Don’t rely on /tmp across turns. It may be cleared between turns. Use the workspace.

Network access

By default, the sandbox has restricted network access. curl and wget work for outbound HTTPS; inbound is blocked. For deployments that need stricter network policy, add iptables rules via scripts/firewall.sh (which the engine repo ships). For network policy ideas:

Allowlist. Only specific outbound domains. Useful when the agent should only touch the user’s known infrastructure.
Block all. Disable network entirely for purely-local computation agents.
Default open. What ships in development. Suitable when the user is the trust boundary.

Subprocess environment

When the sandbox runs a command, environment variables are scrubbed. The level is controlled by SUBPROCESS_SCRUB_MODE:

Mode	Behavior
`off`	Pass the Engine’s environment through. Don’t use in production.
`default`	Strip credentials and known-sensitive vars. Reasonable for dev.
`strict`	Pass only an allowlist of safe vars. Recommended for production.

In strict mode, the agent’s bash sees only PATH, HOME, USER, LANG, and a handful of others. It does NOT see your LLM API key, your Engine API key, or anything else the Engine reads from its own environment.

Patterns

Inspect-then-act

The agent reads files, runs git status, checks logs, then proposes a change. Most useful turns follow this shape. Don’t try to limit the agent to “act only” — it can’t act intelligently without reading first.

Test-driven changes

The agent writes a test, runs it, sees it fail, writes the implementation, runs the test again, sees it pass. This works beautifully when the codebase has fast tests and the agent can run them.

Long-running commands

Commands that take more than a few seconds (builds, large greps) are fine. The sandbox enforces a per-command timeout (configurable). Streaming stdout from a long command lets the agent decide whether to keep waiting or give up.

Background processes

The sandbox includes a background watchdog that kills orphaned processes when the parent task ends. Don’t rely on a daemon spawned by one tool call to be alive in the next.

Pitfalls

rm -rf on a glob that resolves wider than expected. The permission system catches the obvious cases, but write defensively in scripts the agent runs. Use --dry-run for destructive operations whenever the tool supports it.
Heredocs that include shell variables. cat <<EOF will expand variables; use cat <<'EOF' to disable expansion.
Tools that spawn editors. git commit without -m, crontab -e, etc. open an editor; the sandbox doesn’t have one connected. Always pass non-interactive flags.
Trusting exit_code: 0 to mean “did the right thing.” Some commands return 0 even on partial failure. Read the output too.

​What “code execution” means in the Engine

​What the sandbox guarantees

​Allowed roots

​What’s available inside

​A typical code-execution turn

​Working with files across calls

​Network access

​Subprocess environment

​Patterns

​Inspect-then-act

​Test-driven changes

​Long-running commands

​Background processes

​Pitfalls

​See also