Skip to main content

Documentation Index

Fetch the complete documentation index at: https://internal.september.wtf/llms.txt

Use this file to discover all available pages before exploring further.

Every transition the orchestrator drives — provisioning, starting, stopping, restarting, rotating keys, destroying — writes a row to the audit_log table. This is the durable record of what happened to the fleet. It’s the first place to look during an incident and the data behind every fleet metric.

Schema

CREATE TABLE audit_log (
  id BIGSERIAL PRIMARY KEY,
  timestamp TIMESTAMPTZ NOT NULL DEFAULT now(),
  product_id UUID,
  user_id TEXT,
  engine_id UUID,
  action TEXT NOT NULL,
  actor TEXT NOT NULL DEFAULT 'system',
  metadata JSONB NOT NULL DEFAULT '{}',
  duration_ms INTEGER
);

CREATE INDEX idx_audit_log_timestamp ON audit_log (timestamp);
CREATE INDEX idx_audit_log_action ON audit_log (action);
ColumnPurpose
timestampWhen the action started.
product_idThe product whose engine the action targeted (nullable for system-level actions).
user_idThe user the engine belongs to (nullable when not yet allocated).
engine_idThe engine the action targeted (nullable for product-level actions).
actionThe action name (see below).
actorWho triggered it: system, the product slug, or admin.
metadataAction-specific JSON details.
duration_msHow long the action took (when measurable).

Actions

ActionWhenTriggered by
provisionNew engine createdPOST /engines/provision or auto-provision path
startEngine startedPOST /engines/{user_id}/start
stopEngine stoppedPOST /engines/{user_id}/stop
wakeSleeping engine wokenPOST /admit with auto_wake=true
destroyEngine torn downDELETE /engines/{user_id}
rotate_keyAPI key rotatedPOST /engines/{user_id}/rotate-key
health_checkProbe result (sampled, not every tick)health loop
health_failedEngine marked failedhealth loop
auto_restart_startAuto-restart attemptedhealth loop
auto_restart_successAuto-restart succeededhealth loop
auto_restart_failedAuto-restart attempt failedhealth loop
auto_restart_gave_upStopped retrying after max attemptshealth loop
policy_registerProduct registeredPOST /products/register
policy_updateProduct policy changedPUT /products/{product_id}/policy
admit_deniedAdmission denied by policyPOST /admit
provision_deniedProvision denied by policyPOST /provision
The list grows. New actions are additive — old log entries stay valid.

Common queries

Recent activity for one engine

SELECT timestamp, action, actor, metadata, duration_ms
FROM audit_log
WHERE engine_id = '5c2f...'
ORDER BY timestamp DESC
LIMIT 50;
The first place to look when a user reports something weird with their engine.

Provisions in the last hour

SELECT COUNT(*)
FROM audit_log
WHERE action = 'provision'
  AND timestamp > now() - interval '1 hour';
Same query the /metrics endpoint runs. Useful for capacity planning.

Auto-restart hot spots

SELECT engine_id, COUNT(*) AS restarts
FROM audit_log
WHERE action LIKE 'auto_restart%'
  AND timestamp > now() - interval '24 hours'
GROUP BY engine_id
ORDER BY restarts DESC
LIMIT 10;
Engines restarting frequently are usually misconfigured (wrong key, bad image) or genuinely buggy.

Activity by actor

SELECT actor, action, COUNT(*) AS n
FROM audit_log
WHERE timestamp > now() - interval '7 days'
GROUP BY actor, action
ORDER BY n DESC;
Useful for “who’s been active?” — products vs system actions vs admins.

A specific user’s lifetime

SELECT timestamp, action, metadata, duration_ms
FROM audit_log
WHERE user_id = 'cust-12345'
ORDER BY timestamp;
Reconstructs everything that ever happened to that user’s engine. Useful for support investigations.

Metadata conventions

The metadata JSONB has different shape per action:

provision

{
  "engine_version": "september-engine:2.3.0",
  "port": 9042,
  "boot_duration_ms": 4350
}

health_failed

{
  "consecutive_failures": 3,
  "last_error": "timeout after 10s",
  "last_status_code": null
}

auto_restart_failed

{
  "attempt": 4,
  "next_retry_in_s": 80,
  "error": "container exited with code 137"
}

admit_denied

{
  "reason": "rate_limit_exceeded",
  "current_rpm": 612,
  "limit": 600
}
The metadata fields are stable per action — additive changes only.

Retention

audit_log grows. Without cleanup, it fills the database. A reasonable retention policy:
DELETE FROM audit_log
WHERE timestamp < now() - interval '90 days';
Run weekly. Vacuum periodically. For longer compliance needs, export to a long-term store (S3, BigQuery) before deletion. The orchestrator doesn’t ship an exporter today; a small script that streams audit_log to your stack handles it.

What’s not in the audit log

  • Engine /execute calls. The orchestrator doesn’t see them; the product calls the engine directly. Per-call telemetry lives in the engine’s observability_events.
  • Postgres-internal events like vacuum, autovacuum, and so on.
  • Orchestrator startup and shutdown. Logged to stdout instead.
If you want a unified timeline across orchestrator + engine, combine audit_log (orchestrator) with observability_events (engine) on engine_id.

See also

  • Health — what triggers auto_restart_* actions.
  • Lifecycle — what triggers provision/stop/etc.
  • Policy — what triggers admit_denied/provision_denied.
  • API reference — every endpoint that produces audit rows.