Llion Jones, a co-inventor of the ubiquitous Transformer architecture, contends that the artificial intelligence industry is ensnared in a "local minimum" of its own making. This provocative argument, delivered alongside Sakana AI research scientist Luke Darlow, challenges the prevailing dogma that more scale and iterative tweaks to existing models will unlock genuine intelligence. Instead, Sakana AI proposes a biologically inspired paradigm shift with their Continuous Thought Machines (CTM).
Jones and Darlow spoke with the Machine Learning Street Talk podcast about the industry's "success capture," a phenomenon where the remarkable efficacy of a dominant technology, like the Transformer, inadvertently stifles fundamental innovation. Llion Jones, having spent years deeply entrenched in Transformer research, made the deliberate decision to step away. "I'm going to drastically reduce the amounts of research that I'm doing specifically on the Transformer because of the feeling that I have that it's an oversaturated space," he stated, highlighting a personal conviction that the path to true intelligence lies elsewhere.
The core of Sakana AI's critique revolves around the distinction between mimicry and genuine understanding. Jones illustrates this with a striking "spiral problem" analogy: if a standard neural network is tasked with understanding a spiral, it typically approximates it with numerous tiny straight lines, effectively "faking" the shape without grasping the underlying concept of spiraling. Today's AI models, despite their impressive capabilities, similarly excel at mimicking intelligent responses without possessing an internal process of genuine "thinking" or reasoning. They are brilliant imitators, but lack the ability to truly change their minds or backtrack when faced with a complex problem.
Sakana AI’s proposed antidote is the Continuous Thought Machine (CTM), a novel architecture fundamentally inspired by biological brains. Luke Darlow explained that unlike traditional AI, which might attempt to solve a maze by "staring at the whole image and guessing the entire path instantly," the CTM "walks" through the maze step-by-step. This sequential, iterative process allows the CTM to naturally spend more time on challenging problems, effectively "pondering" solutions and correcting its own mistakes—a crucial leap towards more human-like reasoning. CTM’s behavior is rooted in a new kind of representation: the synchronization between neurons over time.
This pursuit of genuinely novel architectures is deeply embedded in Sakana AI's organizational philosophy, which champions research freedom. Llion Jones nostalgically recalled the origins of the Transformer: "The Transformers was very, very bottom up... it was a bunch of people talking over lunch... having the freedom to have, you know, literally months to dedicate to just trying this idea." This contrasts sharply with the current corporate AI research landscape, where immediate commercialization pressures often dictate research agendas, leading to what the podcast host termed "technology capture."
Related Reading
- Beyond Benchmarks: AI's Shift to Orchestrated Intelligence
- Agentic AI: Unlocking Autonomous Intelligence for Enterprise Solutions
- Claude's Evolution: From Chatbot to Cognitive Collaborator
This pressure, ironically, intensifies with the influx of talent and resources into the AI industry. Rather than fostering diverse approaches, it inadvertently narrows the focus, pushing researchers towards incremental improvements on established paradigms. As Jones noted, "It's unfortunate that they [Transformers] work so well because it's too easy for people to just sweep these problems under the carpet." The gravitational pull of a known, effective architecture makes it exponentially harder for truly disruptive, paradigm-shifting ideas to gain traction, even if they demonstrably offer a path to superior capabilities.
The CTM, spotlighted at NeurIPS 2025, represents Sakana AI’s concerted effort to break free from this "local minimum." By integrating a time dimension and leveraging neuron synchronization, CTM exhibits more diverse and interpretable behavior, demonstrating a human-like approach to problem-solving in tasks like maze navigation and image recognition. Its ability to dynamically allocate "thinking time" and learn through self-bootstrapping mechanisms indicates a significant step towards models that can truly "understand" rather than merely mimic. This new architecture, while not a strict emulation, is much more reminiscent of biological brains, seeking to unlock new levels of capability and efficiency.



