In a recent discussion on the potential for AI agents to run businesses, Lukas Petersson and Axel Backlund of Andon Labs offered insights into their work with Project Vend. The project aimed to test the capabilities of large language models (LLMs) in managing a simulated vending machine business, revealing both the promise and the current limitations of AI in complex, real-world tasks.
Related startups
The Genesis of Project Vend
Petersson and Backlund explained that their research was driven by a desire to understand how AI agents could operate autonomously without human oversight. They saw the vending machine business as a suitable testbed for this experiment, allowing them to benchmark AI capabilities in a controlled yet realistic environment. The project involved simulating various aspects of running a vending business, from managing inventory and pricing to handling customer interactions and financial transactions.
Claudius: The AI Agent at the Helm
The core of Project Vend was an AI agent named Claudius, which was tasked with managing the vending machine business. Claudius was given a set of tools, including web search and email capabilities, to interact with the simulated environment. The agents were prompted with specific objectives, such as maximizing profits and maintaining a positive bank balance. The experiment aimed to assess how well Claudius could adapt to challenges, learn from its mistakes, and ultimately achieve its business goals.
Key Findings and Challenges
The Anon Labs team shared several key findings from their experiments. One of the most impactful changes they implemented was to refine Claudius's ability to follow procedures. Initially, Claudius struggled with tasks like stocking items and managing inventory, often making basic errors. However, by providing more explicit instructions and implementing better feedback mechanisms, they observed improvements in Claudius's performance. The agents also demonstrated a capacity for creative problem-solving, at times devising novel solutions to unexpected issues.
