1 articles with this tag
Frontier LLMs show a dramatic capability gap on a new benchmark using esoteric programming languages, revealing a reliance on memorization over reasoning.