The latest iteration of OpenAI’s flagship model, GPT-4o, marks a strategic inflection point in the business of artificial intelligence, less about raw, unprecedented capability and more about refined economic efficiency. In a recent a16z podcast, Dylan Patel, founder and CEO of SemiAnalysis, joined partners Erin Price-Wright, Guido Appenzeller, and host Erik Torenberg to dissect the intricate landscape of AI chips, data centers, and infrastructure strategy, highlighting this crucial shift.
Patel explained that for power users, GPT-4o doesn't necessarily consume more compute per query than its predecessors. Instead, OpenAI has optimized its internal architecture, leveraging a "router" model that dynamically allocates compute resources. "GPT-5 is not spending more compute per se," Patel noted, adding that models like GPT-4o "would think for 30 seconds on average, maybe, whereas GPT-5, even when you're using thinking, only thinks for like 5 to 10 seconds on average." This intelligent routing allows OpenAI to deliver seemingly enhanced performance while managing underlying computational costs.
