"Intelligence is measured by the efficiency of skill acquisition on unknown tasks." This foundational insight, articulated by François Chollet, creator of Keras and the Abstract and Reasoning Corpus for Artificial General Intelligence (ARC-AGI), underpins a critical shift in how the AI community evaluates progress. In a recent interview at NeurIPS 2025, Y Combinator General Partner Diana Hu sat down with Greg Kamradt, President of the ARC Prize Foundation, to dissect why many prevailing AI benchmarks fall short and how ARC-AGI is redefining the pursuit of human-like generalization. Their discussion highlighted a crucial distinction between mere performance and genuine intelligence, a topic of paramount importance for founders, VCs, and AI professionals navigating the rapidly evolving landscape.
The existing paradigm of AI evaluation, often focused on benchmarks like MMLU, has inadvertently steered development towards models that excel at memorization and brute-force computation rather than true understanding or adaptability. As Kamradt noted, "You would normally think that intelligence would be how much can you score on the SAT test, or how hard of math problems can you do." While impressive, achievements in areas like chess, Go, or self-driving demonstrate superhuman skill in specific, often pre-defined domains, but not necessarily the fluid intelligence required to rapidly acquire new, unrelated skills. This narrow focus creates a deceptive sense of progress, akin to "PhD++ problems" that simply demand more data or compute, rather than novel reasoning.
