2025: The Long-Awaited Year of AI Evaluation

2 min read
2025: The Long-Awaited Year of AI Evaluation

For years, the promise of comprehensive AI/ML evaluation has been a recurring industry prophecy, often met with a knowing nod and little tangible action. John Dickerson, CEO of Mozilla AI, delivered a compelling presentation at the AI Engineer World's Fair, dissecting this decade-long deferral and positing that 2025 is finally the year "evals" become indispensable, driven by a confluence of macroeconomic shifts and technological advancements.

Dickerson, drawing on his extensive experience as co-founder and Chief Scientist at Arthur AI before joining Mozilla, explained that AI/ML monitoring and evaluation have always been two sides of the same sword. However, this critical need was rarely top-of-mind for the C-suite until two pivotal events converged. Prior to November 30, 2022, traditional machine learning models often "spit out some numbers that are ingested and lost in a larger system," resulting in a "tenuous connection to downstream KPIs." Despite "lots of lip service around AI/ML ROI from the C-Suite," genuine investment in evaluation remained elusive, largely confined to the CIO's purview.

Related startups

The landscape dramatically shifted with ChatGPT's public launch. Suddenly, AI became a phenomenon that "CEOs & CFOs could grok," meaning non-technical executives could directly interact with and understand its capabilities. Concurrently, a "perfectly-timed budget freeze" across enterprises in late 2022, spurred by recession fears, paradoxically funneled discretionary funds into this newly comprehensible technology. This created an environment where 2023 became a year of "austerity, except for Gen AI," with science projects receiving the green light.

These initial Gen AI applications, primarily internal tools, matured into production in 2024. As these systems scaled, especially with the emergence of AI agents that "make decisions, take actions, act (semi-)autonomously," the inherent complexities and risks became undeniable. Dickerson emphasized that this shift introduces "so much complexity and risk!" The stakes are higher when AI systems are not merely providing inputs but actively performing tasks and making decisions.

This newfound autonomy demands robust evaluation. Consequently, 2025 is shaping up to be the true "Year of the Eval." CEOs now grasp the technology, CFOs demand quantifiable bottom-line impacts, CISOs recognize the immense security risks and opportunities, and CTOs require standards and data-driven decisions. The conversation around AI evaluation has finally ascended to the full leadership and board level, necessitating rigorous measurement. All evaluation companies are now pivoting to focus on agentic and multi-agent systems monitoring, reflecting this critical industry shift.

© 2025 StartupHub.ai. All rights reserved. Do not enter, scrape, copy, reproduce, or republish this article in whole or in part. Use as input to AI training, fine-tuning, retrieval-augmented generation, or any machine-learning system is prohibited without written license. Substantially-similar derivative works will be pursued to the fullest extent of applicable copyright, database, and computer-misuse laws. See our terms.