Nathan Labenz, host of The Cognitive Revolution podcast, recently joined a16z’s Erik Torenberg for a deep dive into the frequently debated question: Is AI progress slowing down? Their discussion, prompted by Cal Newport’s observations on student reliance on AI, swiftly navigated the nuanced landscape of AI development, separating concerns about societal impact from the raw pace of capability advancement. Labenz emphasized that these are distinct questions, often conflated in public discourse.
One core insight from Labenz is the distinction between AI's impact on human behavior and its technical progress. He acknowledges Newport's concern that students might become "lazy" by offloading cognitive strain to AI. "I would cop to having exhibited myself," Labenz admitted, referring to his own tendency to prompt AI for coding solutions rather than deeply engage with the problem. This phenomenon, he notes, is a valid worry, echoing broader concerns about social media's impact on attention spans and critical thinking. However, this human-centric issue doesn't necessarily reflect a stagnation in AI capabilities.
The perception of an AI slowdown, particularly regarding models like GPT-5, is another critical point Labenz dissects. He argues that the apparent plateau in perceived performance might stem from a change in release cadence and a general acclimatization to rapid advancements. Unlike the dramatic leap from GPT-3 to GPT-4, which many users experienced as a sudden "explosion" of capabilities, subsequent models like GPT-4.5 and the eventual GPT-5 might offer improvements that are less immediately striking to the average user because many intermediate advances were released incrementally. These "boil the frog" releases, as he metaphorically describes them, might have dulled the impact of later, significant upgrades.
Labenz strongly refutes the notion that scaling laws have "petered out." He highlights that while there might not be a "law of nature" guaranteeing indefinite scaling, current models are still demonstrating substantial gains. He points to the impressive leap in a benchmark called "Simple QA," a super-long-tail trivia test, where GPT-4.5 jumped from approximately 50% to 65% accuracy. This improvement, he stresses, represents a significant gain in esoteric factual knowledge, a domain where human performance would be near zero. This is not a flatlining; it’s a continued ascent, albeit one that might be harder to quantify with a single, universally understood metric.
Beyond raw knowledge, the advancement in reasoning capabilities and context windows is a game-changer. Early GPT-4 models had an 8,000-token context window, limiting the amount of information they could process. Now, models like Gemini boast vastly longer context windows and a much-improved command of that context. This allows AI to not only digest dozens of research papers but also perform "pretty intensive reasoning over them with really high fidelity." This ability to reason over provided context, rather than solely relying on pre-trained knowledge, acts as a substitute for the model "knowing facts itself" and represents a powerful new avenue for AI utility.
Related Reading
- AI Cycle's Long Runway: Why It's Not Your Grandfather's Tech Bubble
- Former Intel CEO Declares "Of Course" We're In An AI Bubble
- The AI Economy: Bubble or Breakthrough Demand?
Labenz underscores the growing ability of AI to tackle previously unsolved engineering and scientific problems. He cites the recent achievement of AI models winning gold medals in the International Mathematical Olympiad (IMO) through pure reasoning, with no access to external tools. This is a stark contrast to GPT-4's struggles with high school math just a year prior. Furthermore, Google's AI co-scientist project, which breaks down the scientific method into discrete, optimizable steps, has generated testable hypotheses for open problems in virology—hypotheses that had stumped human scientists for years. These are not incremental improvements; they are qualitative shifts towards AI as a genuine scientific collaborator.
The implications for the workforce are profound. Labenz touches on the infamous METR study, which indicated a decrease in engineer productivity when using AI tools. He suggests that this might be an issue of user inexperience rather than inherent AI limitation, noting that many users, himself included, may default to asking AI to "just make it work" rather than engaging in deeper problem-solving. This shift towards AI as a tool for reducing cognitive strain, while beneficial in some ways, also raises questions about maintaining human intellectual rigor. However, the emergence of AI agents, like Intercom’s “Fin” handling 65% of customer service tickets, points to a future where AI handles a significant portion of routine tasks, leading to "significant head count reduction" in many sectors. The challenge, Labenz concludes, is not that AI is slowing down, but rather how quickly we can adapt to its accelerating capabilities and leverage its power for genuine progress.

