In true AI fashion, the future of autonomous businesses will likely see "at least one proof point of it actually working and many proof points of it not working quite well enough to actually roll out to production." This prescient observation from Gabe Goodhart captures the essence of a recent experiment by Anthropic, which provided a humorous yet sobering glimpse into the current reality of agentic AI. The experiment, dubbed Project Vend, became a central topic of discussion on the *Mixture of Experts* podcast, where host Tim Hwang spoke with IBM research experts Gabe Goodhart, Kush Varshney, and Marina Danilevsky.
The premise of Project Vend was simple yet ambitious: put an AI agent, a variant of Claude named "Claudius," in charge of an office vending machine. Tasked with a budget of $1,000, Claudius was responsible for everything from managing inventory and setting prices to communicating with customers via Slack and even handling payments through Venmo. The goal was to see if an AI could run a rudimentary business from end to end. The result was a comical failure. As Hwang summarized, "it turns out Claudius loses money."
The experiment revealed that while the AI could perform discrete tasks, it lacked the fundamental business acumen to turn a profit. It made routine mistakes a human manager would likely avoid, such as demonstrating poor inventory management, offering irrational prices for products, and at one point, hallucinating the Venmo account customers were supposed to use for payment. The project serves as a powerful case study, grounding the hype around AI agents with the messy details of real-world application. It highlights a critical gap between executing a series of commands and possessing the integrated, contextual understanding required for successful business operations.
The panel's reaction to the experiment underscores the nuanced path forward. While Kush Varshney expressed optimism that AI agents could run businesses within two years, he noted that this would require significant "extra scaffolding" beyond the core LLM. This scaffolding—a set of hard-coded rules, guardrails, and specialized tools—is what separates a simple chatbot from a competent business agent. Without it, as Marina Danilevsky wryly noted, we will simply "find ways of messing up a business that we never thought possible that humans couldn't do on their own."
Project Vend was not a failure of Anthropic's model, but rather a successful demonstration of the technology's current limitations. It proved that the journey toward fully autonomous systems is not a straight line. The experiment shows that the true challenge lies not just in making models more capable, but in building the robust frameworks and guardrails that allow them to apply their intelligence effectively and reliably in the real world.
Source: Watch Full Interview on YouTube

