Skip to main content

Documentation Index

Fetch the complete documentation index at: https://internal.september.wtf/llms.txt

Use this file to discover all available pages before exploring further.

Every product registered with bap-engine carries a JSON policy. The orchestrator consults the policy on every admission and provisioning call, returning POLICY_DENIED, QUOTA_EXCEEDED, or RATE_LIMITED when limits are crossed. This page covers what the policy controls and how the orchestrator enforces it.

Where policy lives

In Postgres, on the products table:
SELECT slug, policy FROM products;
Each product has a policy JSONB column. Default value is {}, which means no limits. A typical production policy:
{
  "max_engines": 1000,
  "rate_limit_rpm": 600
}
You set the policy at registration:
POST /products/register
{
  "slug": "my-product",
  "display_name": "My product",
  "policy": { "max_engines": 1000, "rate_limit_rpm": 600 }
}
Or update later:
PUT /products/{product_id}/policy
{
  "policy": { "max_engines": 5000, "rate_limit_rpm": 2000 }
}
Both endpoints require X-Admin-Key.

What the policy controls

max_engines

Maximum concurrent engines for this product. Counted at provision time across all states (running, sleeping, stopped, failed). A destroyed engine doesn’t count (its row is gone). When at the limit, POST /engines/provision and the auto-provision path of POST /admit return:
{
  "error": {
    "code": "QUOTA_EXCEEDED",
    "message": "Product at max_engines limit (1000)."
  }
}
Status: 429.

rate_limit_rpm

Maximum admission requests per minute for this product, across all users. Each POST /engines/{user_id}/admit increments a counter for the product; after rate_limit_rpm calls in a 60-second window, further admits fail with:
{
  "error": {
    "code": "RATE_LIMITED",
    "message": "Rate limit exceeded. Retry in N seconds."
  }
}
Status: 429. The counter is in-memory per orchestrator process — it doesn’t persist across restarts. For multi-process orchestrator deployments, each process has its own counter, and the effective rate is process count × rate_limit_rpm.

What’s not in the policy

The current policy doesn’t gate:
  • Cost per turn. Tracked at the engine layer (LLM provider spending) and aggregated upstream.
  • Engine version. The orchestrator runs all engines on the configured ORCH_ENGINE_IMAGE. Per-product version pins are not supported today.
  • Geographic restrictions. No region routing.
If you need any of these, enforce upstream of the orchestrator (in your product layer) or extend the policy schema and the PolicyEngine.

How enforcement works

PolicyEngine (orchestrator/policy.py) provides two methods:
def check_provision(product, count_existing_engines) -> PolicyDecision: ...
def check_admit(product) -> PolicyDecision: ...
PolicyDecision is {allowed: bool, reason: str | None}. Discovery and the provision endpoint call these before any state change: If the policy denies, the orchestrator returns the appropriate 429/403 immediately. No state changes.

Per-user limits

Today the policy operates at the product level. There’s no built-in per-user quota (e.g. “this specific user gets at most 10 turns/minute”). If you need per-user limits, enforce in the engine’s gateway config (/config/gateway.yaml) or upstream in your product.

Updating policy in flight

PUT /products/{product_id}/policy updates the row in Postgres immediately. The next request consults the new value. There’s no caching — every admit reads the policy fresh. The change does not retroactively destroy engines exceeding the new max_engines. If you lower the cap from 1000 to 500 with 700 engines running, the existing 700 stay; new provisions return QUOTA_EXCEEDED until the count drops below 500. To shed engines, destroy them explicitly via DELETE /engines/{user_id}.

What you can build on top

The policy primitives are deliberately simple. For richer behaviors, the orchestrator’s audit log + the engines table give you the data you need:
  • Idle reaping. Periodic job that destroys engines whose last_health_at is older than threshold X.
  • Cost-based denials. Track external LLM cost upstream; deny admit when a product or user is over budget.
  • Tiered policies. Read product → tier → policy in your product layer; pass the right max_engines/rate_limit_rpm at registration.
These live outside the orchestrator today.

See also