Codex Max is here: OpenAI’s AI can now code for 24 hours

OpenAI has launched GPT-5.1-Codex-Max, a new frontier agentic coding model that fundamentally changes the scope of what AI can handle in software development. Available immediately in the Codex environment, this isn't just a faster autocomplete tool; it's an agent built for marathon coding sessions, capable of sustaining complex work for hours, even days, at a time.

The core breakthrough enabling this long-haul capability is a process OpenAI calls "compaction." GPT-5.1-Codex-Max is the first model natively trained to operate across multiple context windows, coherently managing millions of tokens within a single task. When the session approaches its context limit, the model automatically prunes its history while preserving the most critical context, effectively giving it a fresh memory slate without losing progress.

Related startups

This technical leap unlocks capabilities previously impossible for context-limited models, such as project-scale refactors, deep debugging sessions, and multi-hour agent loops. OpenAI claims internal evaluations have shown Codex Max working independently on tasks for more than 24 hours, persistently iterating on implementations and fixing test failures until a successful result is delivered.

Codex Max is built on an updated foundational reasoning model trained specifically on agentic tasks like PR creation, code review, and frontend development. On the critical SWE-Lancer IC SWE benchmark, Codex Max achieves 79.9% accuracy, a significant jump from the 66.3% achieved by the previous high-effort GPT-5.1-Codex model.

Efficiency and the Bottom Line

Beyond raw capability, the new model is notably more efficient. OpenAI reports that Codex Max uses 30% fewer "thinking tokens" than its predecessor while achieving better performance on benchmarks like SWE-Bench Verified. For developers, this translates directly into real-world savings. The model can generate complex frontend designs, like interactive CartPole RL sandboxes, with similar functionality to the older model, but at a substantially lower computational cost.

For tasks that are not latency-sensitive, OpenAI is also introducing an ‘Extra High’ reasoning effort setting, allowing the model to think for an even longer period to produce a superior answer. However, the company still recommends the ‘medium’ setting as the daily driver for most tasks, balancing speed and accuracy.

The model’s training now includes tasks designed to make it a better collaborator in the Codex CLI and, notably, it is the first model trained to operate natively in Windows environments, expanding its utility across enterprise development stacks.

The implications for developer productivity are stark. OpenAI reports that 95% of its own engineers use Codex weekly, and since adopting the tooling, these engineers are shipping roughly 70% more pull requests. Codex Max is designed to supercharge those gains.

However, increased capability comes with increased responsibility. OpenAI acknowledges that Codex Max is the most capable cybersecurity model they have deployed to date, though it does not yet reach "High capability" under their Preparedness Framework. The company emphasizes that Codex is designed to run in a secure sandbox by default, with file writes limited and network access disabled unless explicitly turned on by the developer.

As Codex Max becomes capable of long-running, independent work, the need for human oversight remains paramount. OpenAI stresses that while the model’s code reviews reduce risk, Codex should be treated as an additional reviewer, not a replacement for human judgment before deploying changes to production.

GPT-5.1-Codex-Max is available starting today for users on ChatGPT Plus, Pro, Business, Edu, and Enterprise plans, replacing the previous GPT-5.1-Codex as the default model in Codex surfaces. API access is slated to arrive soon.

© 2025 StartupHub.ai. All rights reserved. Do not enter, scrape, copy, reproduce, or republish this article in whole or in part. Use as input to AI training, fine-tuning, retrieval-augmented generation, or any machine-learning system is prohibited without written license. Substantially-similar derivative works will be pursued to the fullest extent of applicable copyright, database, and computer-misuse laws. See our terms.

Codex Max is here: OpenAI’s AI can now code for 24 hours

Related startups

Efficiency and the Bottom Line

AI Daily Digest