Claude Opus 4.5 has achieved a remarkable feat, scoring higher than any human candidate ever on Anthropic's notoriously difficult take-home exam for prospective performance engineering hires. This astounding benchmark, highlighting both technical ability and judgment under time pressure, signals a paradigm shift in what AI models are capable of, prompting critical questions about the future of engineering professions. Matthew Berman, in his analysis of Anthropic’s latest release, unpacks the implications of this new frontier model, its impressive benchmarks, innovative features, and strategic pricing.
The competitive landscape of large language models is intensely dynamic, with new frontier models emerging rapidly. Opus 4.5, Anthropic's newest offering, positions itself as a leader across several crucial metrics, particularly in areas vital for enterprise and developer applications. Its standout performance in software engineering, with an 80.9% accuracy on the SWE-bench Verified benchmark, significantly surpasses its predecessor Sonnet 4.5 (77.2%), and rivals like GPT-5.1 Coder-Max (77.9%) and Google’s Gemini 3 Pro (76.2%). This dominance extends to agentic coding and tool use, suggesting a model highly adept at autonomous task execution.
