The assertion that a multi-million dollar AI coding agent business was built largely on "vibes" rather than rigorous evaluations ignited a fervent debate among AI professionals. This discussion, hosted by Swyx on the Latent Space podcast, brought together Ankur Goyal, co-founder and CEO of Braintrust, and Malte Ubl, CTO of Vercel, to dissect the critical role of evaluations (evals) in the rapidly evolving landscape of AI engineering. Their conversation transcended a simple dichotomy, revealing a nuanced spectrum of feedback loops, from intuitive "vibe checks" to complex offline evaluations and A/B testing, each playing a distinct, deliberate role in driving AI product development.
At its core, the challenge of building AI products lies in grappling with "non-deterministic magic," as Ankur Goyal aptly puts it. Unlike traditional software development, where outcomes are often predictable, AI introduces an inherent uncertainty that demands robust feedback mechanisms. The panelists emphasized that the goal isn't to choose one feedback loop over another, but to strategically deploy a combination of approaches, leveraging their unique trade-offs in effort, speed, and efficiency.
