Vincent Koc on Adaptive AI Evaluation

Vincent Koc of Comet ML discusses the limitations of static AI evaluation and the shift towards adaptive, intent-based methods for measuring AI agents.

4 min read
Vincent Koc presenting on adaptive AI evaluation at AI Engineer Europe
Image credit: AI Engineer· AI Engineer

Vincent Koc, speaking at AI Engineer Europe, discussed the evolving landscape of AI evaluation, particularly for adaptive systems. He highlighted the limitations of traditional static benchmarks and proposed a move towards more dynamic and intent-based evaluation methods. Koc, who works with Comet ML, emphasized that as AI models become more sophisticated and capable of self-optimization, the evaluation frameworks must adapt accordingly.

Vincent Koc on Adaptive AI Evaluation - AI Engineer
Vincent Koc on Adaptive AI Evaluation — from AI Engineer

Visual TL;DR

leads to requires needs enables Static BenchmarksFail CalcificationProblem Adaptive AI Agents Intent-BasedEvaluation Future AI Evaluation

The Limitations of Static Benchmarks

Koc began by addressing what he termed the 'calcification problem' in AI evaluation. He explained that static benchmarks, which have been the standard for evaluating AI models, are increasingly failing to capture the true performance and behavior of modern AI systems, especially those that are adaptive. He pointed out that while traditional software engineering uses methods like unit tests, manual regression suites, and CI/CD pipelines, the AI field has a significant gap in its evaluation methodologies.

Related startups

He illustrated this by referencing a common scenario: AI agents are trained and evaluated on static datasets, but their real-world performance can differ significantly due to their ability to learn and adapt. Koc suggested that this reliance on static evaluations leads to a disconnect between how models perform in controlled environments and how they behave in dynamic, real-world scenarios. This is particularly problematic as AI models become more complex and are deployed in critical applications.

The Shift to Adaptive and Intent-Based Evaluation

The core of Koc's presentation revolved around the need to shift from static evaluations to more adaptive and outcome-oriented approaches. He introduced the concept of 'intent-based outcomes,' where the focus is not just on whether the AI produces a correct output, but on whether it achieves the user's intent. This requires a deeper understanding of the AI's decision-making process and its ability to adapt to new information or changing circumstances.

Koc also discussed the idea of 'self-curating suites from traces.' This involves using the AI's own operational data, or 'traces,' to continuously generate and update evaluation sets. Instead of relying on human-curated datasets, the AI itself helps to identify areas where it is performing poorly or where its behavior needs to be assessed. This creates a feedback loop that allows for continuous improvement and adaptation.

From Prompt Engineering to Intent Engineering

The presentation traced the evolution of AI development methodologies, from 'prompt engineering' (2022-23), which focused on crafting precise prompts to elicit desired responses from LLMs, to 'context engineering' (2024-25), which involves managing memory and context more effectively for AI agents. Koc posited that the next phase, starting in 2026, will be 'intent engineering.'

Intent engineering, according to Koc, is about building AI systems that can understand and act upon user intent, even when that intent is not explicitly stated. This involves not only optimizing the AI's performance but also ensuring its behavior aligns with the user's goals and values. He highlighted how models are becoming so good that they can solve complex problems and adapt to new situations, but the challenge lies in aligning their actions with human intent.

The 'Calcification Problem' and its Solutions

Koc reiterated that the 'calcification problem' is a significant hurdle. He explained that as AI models become more capable, they can outpace the static evaluation methods used to measure them. This means that benchmarks that were once effective can become obsolete, leading to a false sense of security or an inability to detect critical failures.

To address this, Koc proposed a move towards 'always-on evaluation and optimization.' This involves integrating evaluation and optimization processes directly into the AI's operational lifecycle, rather than treating them as separate, pre-deployment steps. By continuously monitoring the AI's performance and adapting its evaluation criteria, organizations can ensure that their AI systems remain effective and aligned with their intended purposes.

He also touched upon the idea that 'code is cheap' in the context of AI development. While writing code might be relatively easy, the complexity lies in defining and measuring the desired behavior of the AI. This is where effective evaluation becomes paramount.

The Future of AI Agents

Koc concluded by looking towards the future of AI agents, suggesting that the trend is towards more autonomous and self-optimizing systems. These agents will not only perform tasks but will also be able to understand their own performance, identify areas for improvement, and adapt their behavior accordingly. This necessitates a robust and dynamic evaluation framework that can keep pace with the evolving capabilities of AI.

© 2026 StartupHub.ai. All rights reserved. Do not enter, scrape, copy, reproduce, or republish this article in whole or in part. Use as input to AI training, fine-tuning, retrieval-augmented generation, or any machine-learning system is prohibited without written license. Substantially-similar derivative works will be pursued to the fullest extent of applicable copyright, database, and computer-misuse laws. See our terms.