From static oversight to distributed corrigibility
Abstract
AI safety governance is increasingly organized around evaluations, deployment thresholds, interruptibility, and post-deployment monitoring. This paper argues that these advances still leave a structural gap. Over time, prior safety judgments tend to acquire illegitimate present authority, so that institutions continue to observe and document new risks while losing the ability to treat new evidence as grounds for revising what they have already authorized. I call this failure epistemic hardening. In response, I argue that corrigibility should not be treated solely as a model property, but as a distributed governance property spanning the broader sociotechnical stack. The paper develops four connected claims: that static oversight tends toward epistemic hardening; that distributed corrigibility is the positive condition required to resist it; that trigger-based recursive oversight is needed to make revisability operative rather than nominal; and that a living safety case is one institutional form capable of holding this logic over time. For agentic AI, safe governance requires not only the power to supervise, but the power to invalidate earlier confidence when its grounds no longer hold.