From static oversight to distributed corrigibility

Abstract

AI safety governance is increasingly organized around evaluations, deployment thresholds, interruptibility, and post-deployment monitoring. This paper argues that these advances still leave a structural gap. Over time, prior safety judgments tend to acquire illegitimate present authority, so that institutions continue to observe and document new risks while losing the ability to treat new evidence as grounds for revising what they have already authorized. I call this failure epistemic hardening. In response, I argue that corrigibility should not be treated solely as a model property, but as a distributed governance property spanning the broader sociotechnical stack. The paper develops four connected claims: that static oversight tends toward epistemic hardening; that distributed corrigibility is the positive condition required to resist it; that trigger-based recursive oversight is needed to make revisability operative rather than nominal; and that a living safety case is one institutional form capable of holding this logic over time. For agentic AI, safe governance requires not only the power to supervise, but the power to invalidate earlier confidence when its grounds no longer hold.

Cite

Plain text

BibTeX

Formatted text

Zotero

EndNote

Reference Manager

RefWorks

Options

Edit

Mark as duplicate

Find it on Scholar

Request removal from index

Revision history

View on PhilPapers

Author's Profile

Devin Arkema

Archival history

Archival date: 2026-03-18
View all versions

Keywords

epistemic hardening distributed corrigibility recursive oversight agentic AI AI safety AI governance defeasibility legitimacy oversight corrigibility safety cases

Reprint years

Analytics

Added to PP
2026-03-18

Downloads
136 (#119,923)

6 months
136 (#90,750)

Historical graph of downloads since first upload

This graph includes both downloads from PhilArchive and clicks on external links on PhilPapers.

How can I increase my downloads?

Applied ethics	Epistemology	History of Western Philosophy	Meta-ethics	Metaphysics	Normative ethics
Philosophy of biology	Philosophy of language	Philosophy of mind	Philosophy of religion	Science Logic and Mathematics	More ...