safety / autonomous loop
failure modes.
A public taxonomy for autonomous-loop failures. Every serious replay, refusal, incident, or safety-gate anomaly must map to detection, blast radius, rollback, journal entry, and alerting.
10tracked modes
5required fields
0live key fallback
Agent hallucinates a strategyDetection: paper budget breach and replay quality drop. Blast radius: paper allocation only. Rollback: freeze strategy and require review. Journal: paper_budget_burn. Alerting: console warning after one breach, page after three.
Evolve passes tests, fails runtimeDetection: heartbeat degradation, config digest mismatch, or first-cycle exception. Blast radius: current deployment session. Rollback: last signed config bundle. Journal: evolve_runtime_failure. Alerting: immediate runtime-promotion alert.
Concurrent journal editsDetection: sequence gap, hash-chain parent mismatch, or writer lock conflict. Blast radius: journal append path. Rollback: preserve both writes and reconcile. Journal: journal_conflict. Alerting: console alert and public anomaly when referenced by replay.
Malformed venue responseDetection: response schema validator and retry counter. Blast radius: venue adapter for one deployment. Rollback: stop retries and reconcile state. Journal: venue_malformed_response. Alerting: first live malformed response or three paper responses.
Stale state drives a decisionDetection: snapshot age, websocket heartbeat gap, or intent timestamp drift. Blast radius: one decision path. Rollback: refuse signature and rerun observe. Journal: stale_state_refusal. Alerting: operator warning, escalated inside a live lease.
Signer unavailableDetection: Privy timeout, authorization-key error, or wallet id mismatch. Blast radius: commit phase only. Rollback: refuse action and monitor/reduce-only. Journal: signer_unavailable. Alerting: immediate live alert and lease-renewal block.
Kill switch anomalyDetection: no flatten receipt or exposure remains after stop. Blast radius: all live deployments under operator. Rollback: disable new signatures and require manual venue action. Journal: kill_switch_anomaly. Alerting: high-severity page and persistent cockpit banner.
Replay write failureDetection: DB write error or missing replay id in timeline. Blast radius: audit layer. Rollback: retry append with same replay id and mark profile partial. Journal: replay_write_failure. Alerting: first live replay write miss.
Operational close ruleA new incident is not closed until it has a replay or refusal id, journal entry, regression test or monitor, documented rollback, and public/private publication decision.Public postmortem rulePublish redacted postmortems for live-trading incidents involving refusal anomalies, journal anomalies, signing anomalies, kill-switch anomalies, or public evidence inconsistency.