Anthropic AI has unveiled Claude Opus 4.6, the latest iteration of its flagship large language model. The upgrade targets substantial improvements in coding proficiency, reasoning capabilities, and the ability to process and retain information over extended interactions.
Enhanced Coding and Agentic Tasks
Claude Opus 4.6 sharpens its predecessor's coding skills, demonstrating more deliberate planning and sustained performance on agentic tasks. It operates more reliably within larger codebases and exhibits superior code review and debugging capabilities, actively identifying and correcting its own errors.
Claude Opus 4.6 demonstrates state-of-the-art performance across multiple professional domains.
Beyond code, Opus 4.6 applies its enhanced abilities to everyday work scenarios, including financial analysis, research, and document, spreadsheet, and presentation creation. Within Anthropic's Cowork environment, where Claude can multitask autonomously, Opus 4.6 leverages these skills to operate on behalf of users.
State-of-the-Art Performance Benchmarks
The model achieves top-tier results on several key evaluations. It leads on the agentic coding benchmark Terminal-Bench 2.0 and Humanity’s Last Exam, a complex multidisciplinary reasoning test. On GDPval-AA, an evaluation for economically valuable knowledge work, Opus 4.6 outperforms OpenAI's GPT-5.2 by 144 Elo points and its own predecessor, Claude Opus 4.5, by 190 points.
Furthermore, Opus 4.6 excels on BrowseComp, a test measuring a model's ability to locate hard-to-find online information, surpassing other models. The system card details its state-of-the-art performance across agentic coding, computer use, tool use, search, and finance.
Opus 4.6 leads on real-world work tasks across several professional domains.
Introducing a 1M Token Context Window
A significant advancement for Opus 4.6 is the introduction of a 1 million token context window, currently in beta. This dramatically expands the model's capacity to process and recall information from extensive documents or lengthy conversations, addressing the common issue of 'context rot'.
In a needle-in-a-haystack test (MRCR v2 8-needle 1M variant), Opus 4.6 scored 76%, a substantial leap from Sonnet 4.5's 18.5%. This capability allows for more accurate retrieval and reasoning over vast amounts of text, with fewer performance degradations.
The model also shows marked improvements in long-context reasoning and expert-level reasoning abilities.
Safety and Alignment Maintained
Anthropic emphasizes that these intelligence gains do not compromise safety. Opus 4.6 maintains an alignment profile comparable to or better than other leading models, with low rates of misaligned behaviors such as deception or cooperation with misuse. It also exhibits the lowest rate of over-refusals among recent Claude models.
The company conducted its most comprehensive safety evaluations to date for Opus 4.6, including new tests for user well-being and enhanced probes for its cybersecurity capabilities. New safeguards have been implemented, particularly for its advanced cybersecurity functions, with ongoing efforts to use AI for defensive purposes like patching software vulnerabilities.
Product and API Enhancements
Claude Opus 4.6 is accessible via claude.ai, its API, and major cloud platforms. Developers gain finer control with new API features, including adaptive thinking, four effort levels (low, medium, high, max), context compaction for longer tasks, and support for up to 128k output tokens.
The 1M token context window is available in beta, with premium pricing for prompts exceeding 200k tokens. Claude Code introduces agent teams for parallel task execution, while Claude in Excel and PowerPoint sees substantial upgrades for more capable work integration.
The pricing remains unchanged at $5/$25 per million tokens for standard usage.



