Google is taking another major step beyond the chatbot, releasing a new model designed to give AI agents eyes and hands to directly control software. Unveiled today, the Gemini 2.5 Computer Use model allows AI to interact with graphical user interfaces (GUIs) just like a person would—by looking at the screen, clicking buttons, typing in forms, and navigating complex websites.
Available in preview for developers via the Gemini API, this isn't about connecting to clean, structured APIs. It's about tackling the messy reality of the web, where most tasks still require direct human interaction. Google's demos show the model scraping pet details from one website and inputting them into a separate CRM, then booking an appointment—a multi-step, cross-application task that has long been the holy grail for AI assistants.
