The relentless pursuit of more capable AI models often gets bogged down in the limitations of static benchmarks. These benchmarks, while useful, can lead to models that overfit to specific tasks or datasets, failing to capture a true measure of general intelligence. This paper introduces a novel approach to bridge this gap.
Formalizing Indistinguishability as Intelligence
The core innovation presented is the Generalized Turing Test (GTT), a formal framework designed to compare arbitrary agents based on their indistinguishability. The GTT defines a comparator where agent B can reliably distinguish between interactions with agent A (instructed to imitate B) and another instance of B. This establishes a dataset- and task-agnostic measure of relative intelligence. The researchers explore the structural properties of this comparator, including conditions for transitivity, which allows for the induction of an ordering over equivalence classes of intelligence. Variants with modified interaction protocols, such as querying or bounded interactions, are also analyzed, offering flexibility in evaluation.