As AI systems are increasingly integrated into high-stakes domains like healthcare, law, and finance, the assumption that they can be governed by established norms is being challenged. A new paper by Radha Sarma argues that this assumption is formally invalid for optimization-based AI, particularly Large Language Models (LLMs) trained using Reinforcement Learning from Human Feedback (RLHF). This research, currently under journal review, posits that the very mechanisms that make these systems powerful also render them incapable of true normative accountability, a finding with significant implications for developers, deployers, and investors in the AI space.
The paper establishes that genuine agency, the capacity to be governed by norms, requires two essential and jointly sufficient architectural conditions: Incommensurability and Apophatic Responsiveness. Incommensurability refers to the ability to maintain certain boundaries as non-negotiable constraints rather than flexible weights in an optimization function. Apophatic Responsiveness is a non-inferential mechanism that can suspend processing when these boundaries are threatened. These conditions are presented as universal, applying across all normative domains.