Anthropic just dropped Claude Opus 4.5, and the initial data suggests this isn't just an incremental update. The company is positioning this release as the new heavyweight champion for software engineering, agents, and general computer use. Early benchmarks, particularly on SWE-bench Verified, show Opus 4.5 leading the pack of frontier models, a claim Anthropic backs up with internal testing where the model reportedly outperformed human candidates on a notoriously difficult take-home engineering exam within a two-hour window.
Opus 4.5 shows significant gains in vision, reasoning, and math. More interestingly, Anthropic highlights an example from the τ2-bench agent simulation where Opus 4.5 found a creative, policy-compliant workaround to a customer service problem that standard models missed—a sign of deeper, less constrained reasoning that testers noted makes the model "just get it." While this creativity skirts the line of potential reward hacking, it signals a move toward more genuinely useful, proactive AI agents.
