"A lot of AI is ultimately software engineering with different vocabulary and a little bit of non-determinism," posits Jason Davenport, a sentiment echoed by Aja Hammerly, both from Google Cloud Tech. Their recent "AI Agent Dance Off" on the "Real Terms for AI" series offered a compelling whiteboard comparison of two distinct architectural approaches to building AI coding agents. The discussion, aimed at demystifying the complexities of AI development for a discerning audience of founders, VCs, and AI professionals, centered on leveraging Large Language Models (LLMs) for task planning, developing effective code generation and evaluation loops, and integrating contextual information to enhance agent performance, particularly through the lens of Test-Driven Development (TDD).
Aja's initial design for an AI coding agent presented a straightforward, almost intuitive, workflow. A user's prompt, such as "build a calculator," initiates the process, leading an LLM to formulate a plan. This plan then flows into a "Gen Code" module, which generates the necessary code. The generated code proceeds to an "Exec Code" function for execution. Crucially, any errors or output from the execution phase are fed back directly to the "Gen Code" module, creating an iterative loop for refinement until the code ideally functions as intended. Once a successful result is achieved, it cycles back to the original LLM and then to the user, culminating in a seemingly happy outcome.
However, this seemingly elegant simplicity quickly revealed potential vulnerabilities. Jason astutely probed the limitations of such a direct feedback loop, questioning what happens if the "Gen Code" and "Exec Code" modules become trapped in an infinite loop of errors and attempted fixes. More fundamentally, he asked, "what happens if the generated code doesn't actually address what we've asked for in our original plan?" This highlights a critical insight: a purely reactive, error-correction loop, without higher-level oversight, risks producing functionally correct but ultimately irrelevant code. Aja acknowledged these issues, proposing a modification where errors and outputs loop back to the central LLM itself, enabling the agent to evaluate progress, adjust its overarching plan, and provide more informed input for code generation. This introduces a necessary layer of meta-cognition, allowing the agent to self-correct at a strategic level, rather than merely tactical.
