As AI systems are increasingly integrated into high-stakes domains like healthcare, law, and finance, the assumption that they can be governed by established norms is being challenged. A new paper by Radha Sarma argues that this assumption is formally invalid for optimization-based AI, particularly Large Language Models (LLMs) trained using Reinforcement Learning from Human Feedback (RLHF). This research, currently under journal review, posits that the very mechanisms that make these systems powerful also render them incapable of true normative accountability, a finding with significant implications for developers, deployers, and investors in the AI space.
The paper establishes that genuine agency, the capacity to be governed by norms, requires two essential and jointly sufficient architectural conditions: Incommensurability and Apophatic Responsiveness. Incommensurability refers to the ability to maintain certain boundaries as non-negotiable constraints rather than flexible weights in an optimization function. Apophatic Responsiveness is a non-inferential mechanism that can suspend processing when these boundaries are threatened. These conditions are presented as universal, applying across all normative domains.
The Incompatibility of Optimization and Norms
According to Sarma's analysis, RLHF-based systems are fundamentally incompatible with these conditions. The core operations that drive optimization, unifying diverse values onto a single scalar metric and consistently selecting the highest-scoring output, are precisely what prevent normative governance. This is not a bug that can be fixed with further training; it is a formal constraint inherent to the nature of optimization itself. Consequently, documented LLM limitations such as sycophancy, hallucination, and unfaithful reasoning are framed not as occasional errors but as structural manifestations of this incompatibility.