Anthropic has unveiled version 3.0 of its Responsible Scaling Policy (RSP), a significant overhaul of its framework for mitigating catastrophic AI risks. The update, announced two and a half years after the original policy, reflects both the successes and the limitations of its prior approach as AI capabilities have rapidly advanced.
The original RSP, introduced in September 2023, aimed to address future AI risks through "if-then" commitments tied to "AI Safety Levels" (ASLs). For example, if a model exceeded certain biological science capabilities, stricter safeguards would be implemented. While early ASLs were detailed, later levels were intentionally left vague, awaiting a clearer picture of advanced AI capabilities.