OpenAI has officially launched GPT-5.4, positioning it as their most capable and efficient frontier model specifically designed for professional work. This latest iteration is available across ChatGPT (as GPT-5.4 Thinking), the API, and Codex, with a premium version, GPT-5.4 Pro, also offered for demanding tasks. This GPT-5.4 release represents a significant leap, merging recent advancements in reasoning, coding, and agentic workflows into a unified model.
GPT-5.4 inherits the industry-leading coding prowess of GPT-5.3-Codex while enhancing its ability to interact with tools, software environments, and complex professional documents like spreadsheets and presentations. OpenAI claims the model delivers accurate, effective, and efficient results with less user iteration. In ChatGPT, GPT-5.4 Thinking can now present an upfront plan, allowing users to course-correct mid-response, ensuring the final output aligns more closely with user needs. This version also improves deep web research and maintains context over longer interactions, leading to faster, higher-quality answers.
For developers using the API and Codex, GPT-5.4 marks the first general-purpose model with native, state-of-the-art computer-use capabilities. This enables agents to operate computers and execute complex workflows across various applications. The model supports up to 1 million tokens of context, facilitating long-horizon task planning, execution, and verification. Enhanced tool search helps agents find and utilize the right tools more efficiently within large ecosystems.
Furthermore, GPT-5.4 is touted as OpenAI's most token-efficient reasoning model to date, outperforming GPT-5.2 in problem-solving with significantly fewer tokens, translating to reduced costs and faster speeds. These improvements collectively aim to deliver more reliable agents, accelerate developer workflows, and elevate output quality across all platforms. This GPT-5.4 model demonstrates impressive gains in benchmarks like GDPval and SWE-Bench Pro.
Knowledge Work Enhancements
Building on GPT-5.2's reasoning, GPT-5.4 delivers more consistent results for professionals. It achieves a new state-of-the-art on GDPval, a benchmark for knowledge work across 44 occupations, matching or exceeding industry professionals in 83.0% of comparisons, a substantial jump from GPT-5.2's 70.9%. Tasks include generating sales presentations, accounting spreadsheets, and manufacturing diagrams.
Specific focus was placed on improving spreadsheet, presentation, and document creation. On internal benchmarks for spreadsheet modeling tasks, GPT-5.4 scored 87.3% compared to GPT-5.2's 68.4%. Human raters preferred GPT-5.4-generated presentations 68.0% of the time over GPT-5.2's, citing stronger aesthetics and visual variety. The model also shows a 33% reduction in false individual claims and an 18% reduction in overall response errors compared to GPT-5.2.
Computer Use and Vision Breakthroughs
GPT-5.4's native computer-use capabilities position it as a leading model for developers building agents that perform real-world tasks across websites and software. It excels at writing code for computer operation and issuing mouse/keyboard commands based on screenshots. Its behavior is steerable via developer messages, allowing for custom safety configurations.
Performance benchmarks highlight its capabilities: GPT-5.4 achieved a 75.0% success rate on OSWorld-Verified for navigating desktop environments, significantly surpassing GPT-5.2's 47.3% and even human performance at 72.4%. It also leads in browser navigation benchmarks like WebArena-Verified (67.3% success rate) and Online-Mind2Web (92.8% success rate).
Improved visual perception underpins these advancements. On MMMU-Pro, a visual understanding and reasoning test, GPT-5.4 achieved an 81.2% success rate, up from GPT-5.2's 79.5%. Document parsing also saw improvements, with GPT-5.4 achieving a lower average error rate on OmniDocBench. The model now supports higher fidelity image inputs, up to 10.24M total pixels, enhancing localization and image understanding.
Coding Prowess Integrated
GPT-5.4 merges the coding strengths of GPT-5.3-Codex with enhanced knowledge work and computer-use capabilities, particularly beneficial for longer-running tasks requiring iteration and tool use. It matches or exceeds GPT-5.3-Codex on SWE-Bench Pro while offering lower latency. The new `/fast mode` in Codex can deliver up to 1.5x faster token velocity with GPT-5.4.
The model excels at complex frontend tasks, producing more aesthetic and functional results. An experimental Codex skill, "Playwright (Interactive)", allows for visual debugging of web and Electron apps, even enabling testing during development. This integration demonstrates the power of combined computer-use and coding capabilities, showcased in projects like a theme park simulation game built from a single prompt.
Improved Tool Use
OpenAI has significantly enhanced how models interact with external tools in GPT-5.4. Agents can now navigate larger tool ecosystems, select tools more reliably, and complete multi-step workflows more efficiently. The introduction of tool search in the API addresses the challenge of managing numerous tool definitions in prompts, reducing token usage, costs, and response times.


