Anthropic has unveiled version 3.0 of its Responsible Scaling Policy (RSP), a significant overhaul of its framework for mitigating catastrophic AI risks. The update, announced two and a half years after the original policy, reflects both the successes and the limitations of its prior approach as AI capabilities have rapidly advanced.
The original RSP, introduced in September 2023, aimed to address future AI risks through "if-then" commitments tied to "AI Safety Levels" (ASLs). For example, if a model exceeded certain biological science capabilities, stricter safeguards would be implemented. While early ASLs were detailed, later levels were intentionally left vague, awaiting a clearer picture of advanced AI capabilities.
Assessing Past Successes and Challenges
Anthropic's previous RSP successfully incentivized internal safeguard development, leading to sophisticated input and output classifiers for ASL-3 compliance in May 2025. This also spurred similar frameworks from competitors like OpenAI and Google DeepMind, and informed early AI policy globally, including California's SB 53 and the EU AI Act's Codes of Practice.