Anthropic's latest iteration, Claude Opus 4.1, marks a significant step in the continuous refinement of large language models, particularly strengthening its capabilities in agentic tasks and real-world coding. As highlighted by Matthew Berman in his recent video, this release, while an incremental update over Opus 4, underscores Anthropic's commitment to iterative improvements in core areas critical for advanced AI applications. Berman noted that Anthropic plans to "release substantially larger improvements to our models in the coming weeks," signaling an aggressive development roadmap.
Opus 4.1 demonstrates notable gains in software engineering benchmarks. On SWE-bench Verified, its accuracy improved to 74.5%, up from Opus 4's 72.5% and Sonnet 3.7's 62.3%. Similarly, in agentic terminal coding, Opus 4.1 reached 43.3% on Terminal-Bench, a solid increase from Opus 4’s 39.2%. These figures solidify Claude's position as a leading model for coding, particularly in scenarios requiring autonomous problem-solving and execution within development environments.