Preferred on Google

Google DeepMind Explains AI Agent Building Struggles

Philipp Schmid from Google DeepMind explains the core challenges senior engineers face when building AI agents, contrasting traditional engineering with agentic development.

May 30 at 3:01 PM9 min read

Philipp Schmid of Google DeepMind presenting on why engineers struggle to build AI agents — Philipp Schmid, from Google DeepMind, discusses the challenges in building AI agents.· AI Engineer

Visual TL;DR. Engineer Mindset vs. Agent Reality leads to AI Agent Building Struggles. AI Agent Building Struggles leads to Text is New State. Text is New State leads to Handing Over Control. Handing Over Control leads to Errors Are Just Inputs. Errors Are Just Inputs leads to Unit Tests to Evals. Errors Are Just Inputs leads to Adapt and Loop.

Engineer Mindset vs. Agent Reality: traditional linear deterministic vs. probabilistic adaptive agentic development
Text is New State: agents interpret and generate text for understanding and action
Handing Over Control: engineers must trust agents to make decisions and take actions
Errors Are Just Inputs: mistakes are learning opportunities for agent improvement and adaptation
Unit Tests to Evals: shift from rigid code checks to holistic agent performance evaluation
AI Agent Building Struggles: senior engineers face mental model collisions when building AI agents
Adapt and Loop: agents observe, adapt behavior, and iterate based on feedback

Visual TL;DRQuickExplainDeeper

Philipp Schmid from Google DeepMind recently shared insights into why even experienced engineers face challenges when building AI agents. The talk, titled "Why (Senior) Engineers Struggle to Build AI Agents," highlights five key "mental model collisions" that arise when transitioning from traditional engineering practices to the world of AI agents.

Google DeepMind Explains AI Agent Building Struggles - AI Engineer — Google DeepMind Explains AI Agent Building Struggles — from AI Engineer

The Engineer's Mindset vs. Agent Reality

Schmid begins by contrasting the deterministic nature of traditional software engineering with the probabilistic approach required for AI agents. In traditional software, engineers define explicit steps, write code, test it rigorously, and deploy. This process is linear and predictable. However, building AI agents involves a different paradigm:

Define: Instead of strict definitions, agents are given instructions or goals.
Observe: Agents interact with their environment and receive feedback.
Adapt: Based on observations and feedback, agents adjust their behavior.
Loop Back: This iterative process allows for continuous learning and improvement.

This fundamental difference in approach, Schmid explains, often leads to engineers trying to "code away" the inherent probabilistic nature of AI, leading to the "mental model collisions" he outlines.

Key Challenges and Solutions

Schmid identifies several critical areas where engineers often encounter difficulties:

1. Text is the New State

Traditionally, software states are represented by discrete data structures and booleans. However, with AI agents, particularly those leveraging large language models (LLMs), text becomes the primary means of representing information and intent. The trap here is treating natural language instructions as if they were simple booleans, failing to capture the nuanced semantic meaning. The fix involves preserving this semantic meaning through raw strings and allowing the agent to intelligently interpret and downstream process this information.

2. Handing Over Control

In microservices, user intent often maps to a specific route. Engineers intuitively hand-code these paths. With AI agents, however, the interactions are more fluid and less deterministic. The trap is to treat agents as mere traffic controllers, expecting them to follow rigid, pre-defined paths. Instead, agents should be trusted as dispatchers that can navigate ambiguity. The key insight is to describe what you want, not the exact path to get there, providing constraints and procedures rather than rigid routes.

3. Errors Are Just Inputs

Traditional software development often involves failing fast and crashing when errors occur. This approach, while effective for deterministic systems, is counterproductive for AI agents. An agent that fails quickly on a minor schema fault might cost $0.50 and take 5 minutes to debug, but crashing at a critical step (4 out of 5) is unacceptable. The collision occurs when engineers treat errors as definitive failures. The fix is to view errors as valuable inputs, allowing the agent to learn from them and self-correct. This involves catching the error, feeding it back into the agent's process, and enabling it to try a different approach.

4. From Unit Tests to Evals

The evaluation of AI agents differs significantly from traditional software testing. Unit tests, which rely on deterministic assertions, are not sufficient. Schmid emphasizes the need to move towards "evals" which are designed for non-deterministic outputs. This involves running multiple trials per prompt to measure the distribution of results. Negative cases are crucial; testing that agents ignore irrelevant information is as important as testing their core functionality. Furthermore, the focus should be on grading the outcome, not the specific path the agent took to get there. This means evaluating how often the agent succeeds and ensuring reliability, rather than enforcing rigid, step-by-step adherence.

5. Agents Evolve, APIs Don't

A significant challenge lies in the static nature of APIs versus the dynamic evolution of agents. Traditional APIs are often designed with a "human-grade" approach, expecting clear, unambiguous parameters. However, agents are inherently literal and can hallucinate ambiguous parameters. The trap is to build APIs for agents as if they were human developers. The solution is to create "agent-ready" APIs that are explicit, verbose, and self-documenting. This means providing clear descriptions of functions and their expected behavior, including what happens if an item is not found, ensuring the agent has all the necessary context without needing to infer it.

Summary: Trust, but Verify

Schmid concludes by summarizing the core principles for building effective AI agents:

Stop Fighting the Model: Accept that you are a dispatcher, not a programmer.
Preserve Meaning: Treat text as the primary state, not just booleans.
Design for Recovery: Build agents that can learn from errors and adapt.
Evaluate, Don't Assert: Measure performance through multiple trials and LLM-as-judge evaluations.
Build to Delete: Understand that agents evolve, and their underlying models will need to be rebuilt and improved over time.

The fundamental takeaway is that building AI agents requires a shift in mindset, embracing the probabilistic nature of these systems and adapting traditional engineering practices accordingly.

© 2026 StartupHub.ai. All rights reserved. Do not enter, scrape, copy, reproduce, or republish this article in whole or in part. Use as input to AI training, fine-tuning, retrieval-augmented generation, or any machine-learning system is prohibited without written license. Substantially-similar derivative works will be pursued to the fullest extent of applicable copyright, database, and computer-misuse laws. See our terms.

#Philipp Schmid #Google DeepMind #Artificial Intelligence #AI Agents #Software Engineering