Audit log - Docs

Every transition the orchestrator drives — provisioning, starting, stopping, restarting, rotating keys, destroying — writes a row to the audit_log table. This is the durable record of what happened to the fleet. It’s the first place to look during an incident and the data behind every fleet metric.

Schema

CREATE TABLE audit_log (
  id BIGSERIAL PRIMARY KEY,
  timestamp TIMESTAMPTZ NOT NULL DEFAULT now(),
  product_id UUID,
  user_id TEXT,
  engine_id UUID,
  action TEXT NOT NULL,
  actor TEXT NOT NULL DEFAULT 'system',
  metadata JSONB NOT NULL DEFAULT '{}',
  duration_ms INTEGER
);

CREATE INDEX idx_audit_log_timestamp ON audit_log (timestamp);
CREATE INDEX idx_audit_log_action ON audit_log (action);

Column	Purpose
`timestamp`	When the action started.
`product_id`	The product whose engine the action targeted (nullable for system-level actions).
`user_id`	The user the engine belongs to (nullable when not yet allocated).
`engine_id`	The engine the action targeted (nullable for product-level actions).
`action`	The action name (see below).
`actor`	Who triggered it: `system`, the product slug, or `admin`.
`metadata`	Action-specific JSON details.
`duration_ms`	How long the action took (when measurable).

Actions

Action	When	Triggered by
`provision`	New engine created	`POST /engines/provision` or auto-provision path
`start`	Engine started	`POST /engines/{user_id}/start`
`stop`	Engine stopped	`POST /engines/{user_id}/stop`
`wake`	Sleeping engine woken	`POST /admit` with `auto_wake=true`
`destroy`	Engine torn down	`DELETE /engines/{user_id}`
`rotate_key`	API key rotated	`POST /engines/{user_id}/rotate-key`
`health_check`	Probe result (sampled, not every tick)	health loop
`health_failed`	Engine marked `failed`	health loop
`auto_restart_start`	Auto-restart attempted	health loop
`auto_restart_success`	Auto-restart succeeded	health loop
`auto_restart_failed`	Auto-restart attempt failed	health loop
`auto_restart_gave_up`	Stopped retrying after max attempts	health loop
`policy_register`	Product registered	`POST /products/register`
`policy_update`	Product policy changed	`PUT /products/{product_id}/policy`
`admit_denied`	Admission denied by policy	`POST /admit`
`provision_denied`	Provision denied by policy	`POST /provision`

The list grows. New actions are additive — old log entries stay valid.

Common queries

Recent activity for one engine

SELECT timestamp, action, actor, metadata, duration_ms
FROM audit_log
WHERE engine_id = '5c2f...'
ORDER BY timestamp DESC
LIMIT 50;

The first place to look when a user reports something weird with their engine.

Provisions in the last hour

SELECT COUNT(*)
FROM audit_log
WHERE action = 'provision'
  AND timestamp > now() - interval '1 hour';

Same query the /metrics endpoint runs. Useful for capacity planning.

Auto-restart hot spots

SELECT engine_id, COUNT(*) AS restarts
FROM audit_log
WHERE action LIKE 'auto_restart%'
  AND timestamp > now() - interval '24 hours'
GROUP BY engine_id
ORDER BY restarts DESC
LIMIT 10;

Engines restarting frequently are usually misconfigured (wrong key, bad image) or genuinely buggy.

Activity by actor

SELECT actor, action, COUNT(*) AS n
FROM audit_log
WHERE timestamp > now() - interval '7 days'
GROUP BY actor, action
ORDER BY n DESC;

Useful for “who’s been active?” — products vs system actions vs admins.

A specific user’s lifetime

SELECT timestamp, action, metadata, duration_ms
FROM audit_log
WHERE user_id = 'cust-12345'
ORDER BY timestamp;

Reconstructs everything that ever happened to that user’s engine. Useful for support investigations.

Metadata conventions

The metadata JSONB has different shape per action:

`provision`

{
  "engine_version": "september-engine:2.3.0",
  "port": 9042,
  "boot_duration_ms": 4350
}

`health_failed`

{
  "consecutive_failures": 3,
  "last_error": "timeout after 10s",
  "last_status_code": null
}

`auto_restart_failed`

{
  "attempt": 4,
  "next_retry_in_s": 80,
  "error": "container exited with code 137"
}

`admit_denied`

{
  "reason": "rate_limit_exceeded",
  "current_rpm": 612,
  "limit": 600
}

The metadata fields are stable per action — additive changes only.

Retention

audit_log grows. Without cleanup, it fills the database. A reasonable retention policy:

DELETE FROM audit_log
WHERE timestamp < now() - interval '90 days';

Run weekly. Vacuum periodically. For longer compliance needs, export to a long-term store (S3, BigQuery) before deletion. The orchestrator doesn’t ship an exporter today; a small script that streams audit_log to your stack handles it.

What’s not in the audit log

Engine /execute calls. The orchestrator doesn’t see them; the product calls the engine directly. Per-call telemetry lives in the engine’s observability_events.
Postgres-internal events like vacuum, autovacuum, and so on.
Orchestrator startup and shutdown. Logged to stdout instead.

If you want a unified timeline across orchestrator + engine, combine audit_log (orchestrator) with observability_events (engine) on engine_id.

​Schema

​Actions

​Common queries

​Recent activity for one engine

​Provisions in the last hour

​Auto-restart hot spots

​Activity by actor

​A specific user’s lifetime

​Metadata conventions

​provision

​health_failed

​auto_restart_failed

​admit_denied

​Retention

​What’s not in the audit log

​See also