"We enable it to think. It can perceive the environment. It will think step-by-step and then finish this multi-step task." This statement from Jie Tan, Senior Staff Research Scientist at Google DeepMind, encapsulates the profound leap forward represented by Gemini Robotics 1.5. Unveiled in a recent demonstration, this new family of models signals a pivotal moment in the development of physical AI agents, moving beyond rote execution to embody genuine reasoning and adaptability in the physical world. The showcase, featuring contributions from key researchers including Tan, Principal Software Engineer Kanishka Rao, Staff Research Scientist Coline Devin, and Senior Director and Head of Robotics Carolina Parada, highlighted how these advancements promise to redefine the utility and scalability of robotic systems.
For years, robotic systems, while impressive in their precision, remained largely confined to executing pre-programmed, singular tasks. The previous iteration of Gemini Robotics, for instance, could be trained to perform an action like placing a banana into a bowl, but only through extensive, repetitive instruction for that specific action. As Jie Tan noted, "Previously, Gemini Robotics version, it has been tested over and over and over again to put this banana into the bowl. This is a very simple task." This approach, while effective for highly controlled industrial environments, presented a significant bottleneck for real-world deployment where variability is the norm. Gemini Robotics 1.5 fundamentally shifts this paradigm, empowering robots to tackle longer, multi-step challenges that demand dynamic decision-making and continuous environmental perception.