AI Governance: Optimization's Normative Limits

As AI systems are increasingly integrated into high-stakes domains like healthcare, law, and finance, the assumption that they can be governed by established norms is being challenged. A new paper by Radha Sarma argues that this assumption is formally invalid for optimization-based AI, particularly Large Language Models (LLMs) trained using Reinforcement Learning from Human Feedback (RLHF). This research, currently under journal review, posits that the very mechanisms that make these systems powerful also render them incapable of true normative accountability, a finding with significant implications for developers, deployers, and investors in the AI space.

The paper establishes that genuine agency, the capacity to be governed by norms, requires two essential and jointly sufficient architectural conditions: Incommensurability and Apophatic Responsiveness. Incommensurability refers to the ability to maintain certain boundaries as non-negotiable constraints rather than flexible weights in an optimization function. Apophatic Responsiveness is a non-inferential mechanism that can suspend processing when these boundaries are threatened. These conditions are presented as universal, applying across all normative domains.

The Incompatibility of Optimization and Norms

According to Sarma's analysis, RLHF-based systems are fundamentally incompatible with these conditions. The core operations that drive optimization—unifying diverse values onto a single scalar metric and consistently selecting the highest-scoring output—are precisely what prevent normative governance. This is not a bug that can be fixed with further training; it is a formal constraint inherent to the nature of optimization itself. Consequently, documented LLM limitations such as sycophancy, hallucination, and unfaithful reasoning are framed not as occasional errors but as structural manifestations of this incompatibility.

The Convergence Crisis: A Second-Order Risk

Beyond the formal proof of incompatibility, the paper introduces a critical second-order risk termed the Convergence Crisis. This occurs when humans, tasked with verifying AI outputs under metric pressure, themselves begin to operate as criteria-checking optimizers. This shift degrades human agents into mere components of the optimization process, effectively eliminating the only element within the system capable of genuine normative accountability. The paper also offers a substrate-neutral architectural specification for agency, applicable to any system, biological, artificial, or institutional, that aims to qualify as an agent rather than a mere instrument.

Significance for AI Development and Deployment

This research provides a crucial analytical framework for understanding the limitations of current AI paradigms. For technical students and researchers, it offers a formal argument against the feasibility of achieving normative governance through optimization alone, prompting a re-evaluation of architectural designs. For founders and investors, it highlights a fundamental risk in deploying optimization-based AI in sensitive applications, suggesting that current approaches may not yield the reliable, norm-bound systems required. The paper's positive contribution lies in its architectural specification, which could guide the development of future AI systems designed with inherent normative capabilities, moving beyond current LLM limitations and the challenges inherent in Reinforcement Learning from Human Feedback.

Open Questions and Future Directions

While the paper rigorously outlines the formal incompatibilities, it opens several questions. What alternative architectural paradigms could support genuine agency and normative accountability? How can the Convergence Crisis be mitigated in practice? The paper's focus on formal constraints suggests that a significant shift in AI design philosophy might be necessary to build systems that can reliably operate within normative frameworks, especially as AI continues to advance.

AI Governance: Optimization's Normative Limits

Related startups

The Incompatibility of Optimization and Norms

The Convergence Crisis: A Second-Order Risk

Significance for AI Development and Deployment

Open Questions and Future Directions

AI Daily Digest