Tiny Recursive Model Beats LLMs on Hard Puzzles

In the AI arms race, the mantra has long been "bigger is better." But a new paper from Samsung’s AI lab in Montreal is challenging that notion with a system that proves less can be much, much more. The Tiny Recursive Model (TRM), detailed in a paper by researcher Alexia Jolicoeur-Martineau, is a shockingly small 7-million-parameter model that runs circles around giant Large Language Models (LLMs) on complex reasoning tasks.

While models like GPT-4 and Gemini have billions or even trillions of parameters, they often stumble on puzzles that require strict, step-by-step logic, like Sudoku or the abstract reasoning challenges in the ARC-AGI benchmark. A single wrong step in their "chain of thought" can derail the entire answer.

TRM takes a different path. Instead of a single, massive forward pass, it uses a tiny two-layer network to recursively refine its answer. It takes an initial guess, loops through its own reasoning process to improve it, and repeats this cycle up to 16 times. This iterative process allows the model to self-correct and converge on a solution with extreme parameter efficiency.

The results are staggering. On the difficult Sudoku-Extreme benchmark, TRM achieves 87% accuracy, crushing a more complex predecessor model (HRM) that scored 55%. More impressively, it scores 45% on the ARC-AGI-1 benchmark, outperforming heavyweight contenders like Google’s Gemini 2.5 Pro, all while using less than 0.01% of the parameters.

A simpler, smarter loop

TRM’s elegance comes from simplifying a previous, more complicated approach called the Hierarchical Reasoning Model (HRM). The paper shows that HRM’s complexity—which involved two separate networks and shaky justifications based on brain biology and advanced math—was unnecessary.

Jolicoeur-Martineau’s team stripped the system down to its essentials. They replaced HRM’s two networks with a single, unified one. They ditched the complex "1-step gradient approximation" for a more straightforward method of backpropagation. And they found that a smaller, two-layer network actually generalized *better* than a four-layer one, likely because it avoided overfitting on the small training datasets typical for these hard logic problems.

This isn't just an academic curiosity. TRM’s success suggests a new direction for AI development, one that prioritizes architectural ingenuity over brute-force scale. It points to a future where small, hyper-specialized models could handle complex reasoning tasks on local devices, without needing a connection to a massive data center. While TRM won't be writing poetry or summarizing emails, it proves that for certain classes of hard problems, the smartest AI in the room might also be the tiniest.

Tiny Recursive Model Beats LLMs on Hard Puzzles

Related startups

A simpler, smarter loop

AI Daily Digest