Ankur Goyal, CEO of Braintrust, recently presented at the AI Engineer World's Fair in San Francisco, introducing Loop, a new evaluation assistant designed to fundamentally transform the manual, often cumbersome process of AI model development. Goyal’s address highlighted the rapid, yet surprisingly labor-intensive, growth in AI evaluation practices and detailed how Loop addresses these critical bottlenecks for developers building advanced AI products.
The current landscape of AI evaluation, or "evals," sees organizations logging a staggering number of experiments. Goyal noted, "On average, organizations log 12.8 experiments per day with Braintrust. Some of our customers run more than 3,000 evals a day." This volume underscores the intense iteration required in AI development, yet the process remains stubbornly human-centric. Engineers are spending significant time—"more than two hours in the product every day"—manually sifting through dashboards, attempting to discern actionable insights from raw evaluation data.
