OpenAI GPT-5.4: The Unified AI Powerhouse?

OpenAI unveils GPT-5.4, its most advanced AI model for professional tasks, boasting enhanced reasoning, coding, and computer interaction capabilities.

Mar 6 at 1:47 AM5 min read
Screenshot of OpenAI's announcement for GPT-5.4

OpenAI has officially announced the release of GPT-5.4, heralded as its most capable and efficient frontier model to date, specifically engineered for professional work. This new iteration is designed to unify the strengths of its predecessors, notably integrating the coding prowess of GPT-5.3-Codex with enhanced reasoning, coding, and agentic workflow capabilities into a single, powerful model. Available now within ChatGPT, the API, and Codex, GPT-5.4 aims to deliver a significant leap in performance and versatility for a wide range of complex tasks.

OpenAI GPT-5.4: The Unified AI Powerhouse? — from Matthew Berman

GPT-5.4: A Unified Frontier Model

The core thesis behind GPT-5.4 is to consolidate OpenAI's recent advancements into a single, cohesive model. This means developers and professionals can now leverage a unified platform that excels in multiple domains simultaneously. The model incorporates industry-leading coding capabilities, building upon the foundation laid by GPT-5.3-Codex, while also improving its performance in tasks involving tools, software environments, and professional applications like spreadsheet manipulation, presentation creation, and document analysis. The result, as described by OpenAI, is a model that can execute complex real-world work with notable accuracy, effectiveness, and efficiency, providing faster and more contextually relevant responses.

Enhanced Reasoning and Contextual Understanding

In ChatGPT, GPT-5.4 is set to offer an upfront plan of its thinking process, allowing users to adjust the model's course mid-response. This feature is particularly valuable for complex tasks where iterative refinement is crucial. The model can also now provide a final output that is more closely aligned with user expectations, even without additional turns of interaction. Furthermore, GPT-5.4 demonstrates improved deep web research capabilities, especially for highly specific queries. Its enhanced ability to maintain context for questions requiring longer thinking periods is a significant advancement, promising higher-quality answers that are delivered faster and remain relevant to the ongoing task.

Computer Use and Vision Capabilities

GPT-5.4 is specifically designed to excel across a broad spectrum of computer-use workloads. Its native computer-use capabilities, combined with its vision features, enable it to operate computers via libraries like Playwright and issue mouse and keyboard commands in response to screenshots. This allows for more sophisticated agentic workflows, where the AI can directly interact with software systems and websites to complete real-world tasks. The benchmark results presented highlight GPT-5.4's superior performance in these areas. For instance, on the OSWorld-Verified benchmark, which measures a model's ability to navigate a desktop environment through screenshots and mouse actions, GPT-5.4 achieved a state-of-the-art 75.0% success rate, significantly surpassing GPT-5.2's 47.3% and outperforming human performance at 72.4%. This capability is further bolstered by its ability to handle complex tasks with fewer tool yields, making it more efficient and cost-effective.

Competitive Performance and API Pricing

The release includes a detailed comparison chart showcasing GPT-5.4's performance against other leading models, including Anthropic's Claude Opus 4.6 and Google's Gemini 3.1 Pro. Across various benchmarks like GDPVal, BrowseComp, and GPQA Diamond, GPT-5.4 consistently demonstrates competitive or superior results. However, this enhanced performance comes with a corresponding increase in API costs. GPT-5.4 is priced higher per token than GPT-5.2, reflecting its improved capabilities and token efficiency. Input tokens for GPT-5.4 cost $2.50 per million, with cached input tokens at $0.25 per million, and output tokens at $15 per million. The Pro versions are priced even higher, with GPT-5.4-pro input costing $30 per million tokens. While these costs are substantial, the model's efficiency and ability to reduce the total number of tokens required for tasks may offer a cost-benefit in the long run.

Industry Reactions and Early Testers' Insights

Early reactions from industry figures have been overwhelmingly positive. Matt Shumer, a prominent AI tester, stated, "I've been testing GPT-5.4 for the last week. In short, it is the best model in the world, by far." He highlighted its ability to make the "which model should I use?" conversation feel almost over due to its superior performance and noted that for the first time, he barely uses Pro models anymore, finding the standard GPT-5.4 with heavy thinking sufficient for most tasks. Flavio Adamo also shared his experience, calling GPT-5.4 "GOOD" and noting that it "one-shotted" previous models on Codex, leading to a faster development cycle for his team. Peter Steinberger further lauded the model, stating, "it's now unified and smarter on everything else, writes better docs, is a better general purpose agent and is overall more pleasant to use." He also pointed out that while the coding capabilities are nearly flawless, the model can still miss obvious real-world context, as demonstrated by an itinerary planning example where it failed to account for crowd density.

The Future of AI Models

The release of GPT-5.4 signifies a major step forward in the development of large language models. By unifying advanced capabilities and demonstrating superior performance across a range of benchmarks, OpenAI is setting a new standard for AI agents and professional tools. While the increased cost is a factor to consider, the model's efficiency, speed, and versatility suggest it could become an indispensable tool for professionals and developers alike. The ongoing advancements in AI models, exemplified by GPT-5.4, indicate a rapidly evolving landscape where capabilities are continuously pushed forward, promising even more sophisticated applications in the near future.