Claude Opus 4.5 Arrives, Dominating Code and Agents

Anthropic just dropped Claude Opus 4.5, and the initial data suggests this isn't just an incremental update. The company is positioning this release as the new heavyweight champion for software engineering, agents, and general computer use. Early benchmarks, particularly on SWE-bench Verified, show Opus 4.5 leading the pack of frontier models, a claim Anthropic backs up with internal testing where the model reportedly outperformed human candidates on a notoriously difficult take-home engineering exam within a two-hour window.

Opus 4.5 shows significant gains in vision, reasoning, and math. More interestingly, Anthropic highlights an example from the τ2-bench agent simulation where Opus 4.5 found a creative, policy-compliant workaround to a customer service problem that standard models missed—a sign of deeper, less constrained reasoning that testers noted makes the model "just get it." While this creativity skirts the line of potential reward hacking, it signals a move toward more genuinely useful, proactive AI agents.

The New Efficiency Frontier

Beyond raw capability, Anthropic is pushing efficiency hard. The introduction of an 'effort parameter' on the API allows developers to dial in performance versus cost. At a medium setting, Opus 4.5 reportedly matches the performance of the previous Sonnet 4.5 while using 76% fewer output tokens. This focus on token efficiency, combined with enhanced context management and tool use, suggests a significant reduction in operational costs for complex, long-running agentic workflows.

The platform updates reflect this focus. Claude Code now features a more rigorous planning mode, and consumer apps are leveraging Opus 4.5 to handle longer conversations by automatically summarizing context. With pricing dropping to $5/$25 per million tokens for Opus-level capabilities, Anthropic is aggressively pushing for broader enterprise adoption. The message is clear: Opus 4.5 is designed not just to be smarter, but to fundamentally change the economics of deploying high-end AI.

Claude Opus 4.5 Arrives, Dominating Code and Agents

Related startups

The New Efficiency Frontier

AI Daily Digest