Every product registered with bap-engine carries a JSONDocumentation Index
Fetch the complete documentation index at: https://internal.september.wtf/llms.txt
Use this file to discover all available pages before exploring further.
policy. The
orchestrator consults the policy on every admission and provisioning
call, returning POLICY_DENIED, QUOTA_EXCEEDED, or RATE_LIMITED
when limits are crossed.
This page covers what the policy controls and how the orchestrator
enforces it.
Where policy lives
In Postgres, on theproducts table:
policy JSONB column. Default value is {},
which means no limits. A typical production policy:
X-Admin-Key.
What the policy controls
max_engines
Maximum concurrent engines for this product. Counted at provision
time across all states (running, sleeping, stopped, failed).
A destroyed engine doesn’t count (its row is gone).
When at the limit, POST /engines/provision and the auto-provision
path of POST /admit return:
429.
rate_limit_rpm
Maximum admission requests per minute for this product, across all
users. Each POST /engines/{user_id}/admit increments a counter
for the product; after rate_limit_rpm calls in a 60-second window,
further admits fail with:
429.
The counter is in-memory per orchestrator process — it doesn’t
persist across restarts. For multi-process orchestrator deployments,
each process has its own counter, and the effective rate is process count × rate_limit_rpm.
What’s not in the policy
The current policy doesn’t gate:- Cost per turn. Tracked at the engine layer (LLM provider spending) and aggregated upstream.
- Engine version. The orchestrator runs all engines on the
configured
ORCH_ENGINE_IMAGE. Per-product version pins are not supported today. - Geographic restrictions. No region routing.
PolicyEngine.
How enforcement works
PolicyEngine (orchestrator/policy.py) provides two methods:
PolicyDecision is {allowed: bool, reason: str | None}.
Discovery and the provision endpoint call these before any state
change:
If the policy denies, the orchestrator returns the appropriate
429/403 immediately. No state changes.
Per-user limits
Today the policy operates at the product level. There’s no built-in per-user quota (e.g. “this specific user gets at most 10 turns/minute”). If you need per-user limits, enforce in the engine’s gateway config (/config/gateway.yaml) or upstream in your product.
Updating policy in flight
PUT /products/{product_id}/policy updates the row in Postgres
immediately. The next request consults the new value. There’s no
caching — every admit reads the policy fresh.
The change does not retroactively destroy engines exceeding the
new max_engines. If you lower the cap from 1000 to 500 with 700
engines running, the existing 700 stay; new provisions return
QUOTA_EXCEEDED until the count drops below 500.
To shed engines, destroy them explicitly via
DELETE /engines/{user_id}.
What you can build on top
The policy primitives are deliberately simple. For richer behaviors, the orchestrator’s audit log + theengines table give you the data
you need:
- Idle reaping. Periodic job that destroys engines whose
last_health_atis older than threshold X. - Cost-based denials. Track external LLM cost upstream; deny admit when a product or user is over budget.
- Tiered policies. Read product → tier → policy in your product
layer; pass the right
max_engines/rate_limit_rpmat registration.
See also
- Architecture — where policy fits.
- API reference: admin endpoints — registering products, updating policy.
- Errors —
QUOTA_EXCEEDED,RATE_LIMITED,POLICY_DENIED.

