Holo2 Foundational Models: Next-Gen AI Agents for Digital Interaction

Holo2 foundational models advance AI agents for web, desktop, and mobile GUIs with enhanced navigation, task execution, and state-of-the-art UI localization.

Holo2 foundational models powering next-gen AI agents for digital interaction

Holo2: Advanced AI agents for navigating and interacting with web, desktop, and mobile interfaces.

H Company has unveiled Holo2, a new family of large-scale Vision-Language Models (VLMs) engineered to power multi-domain GUI agents. These agents are designed to interpret, reason over, and act within real digital environments, including web, desktop, and mobile interfaces. Moving beyond static perception, Holo2 emphasizes navigation and multi-step task execution, building on the UI localization and screen understanding capabilities of its predecessor, Holo1.5. Significant advancements have been made in policy learning, action grounding, and cross-environment generalization.

Holo2 Models and Capabilities

The Holo2 series comprises four distinct model sizes: Holo2-4B and Holo2-8B are fully open-sourced under the Apache 2.0 license. Holo2-30B-A3B and Holo2-235B-A22B are available under a research-only license, with commercial licensing requiring direct contact with H Company. These models are positioned as reliable and efficient foundations for next-generation computer use agents, such as the Surfer-H agent. Developed by H Company, these vision language models are fine-tuned from Qwen/Qwen3-VL-235B-A22B-Thinking. The training strategy involves a multi-stage pipeline utilizing proprietary data for UI understanding and action prediction, combined with open-source datasets, synthetic data, and human annotations. This is followed by supervised fine-tuning and online reinforcement learning (GRPO) to achieve state-of-the-art performance.

A key innovation in the Holo2 models is 'agentic localization,' which addresses the challenges of pinpointing small UI elements in high-resolution, 4K interfaces. This iterative refinement process boosts accuracy, yielding relative gains of 10-20% across all Holo2 sizes. Notably, the Holo2-235B-A22B model achieves 70.6% accuracy on the demanding ScreenSpot-Pro benchmark in a single step, and an impressive 78.5% within three steps, setting a new state-of-the-art for GUI grounding benchmarks. The Holo2 foundational models demonstrate superior performance across various UI localization benchmarks, outperforming previous models and establishing new industry standards.

Holo2 Foundational Models: Next-Gen AI Agents for Digital Interaction

Holo2 foundational models advance AI agents for web, desktop, and mobile GUIs with enhanced navigation, task execution, and state-of-the-art UI localization.

Holo2: Advanced AI agents for navigating and interacting with web, desktop, and mobile interfaces.

Holo2 Models and Capabilities

Holo2 Foundational Models: Next-Gen AI Agents for Digital Interaction

Holo2 Models and Capabilities

AI Daily Digest

Holo2 Foundational Models: Next-Gen AI Agents for Digital Interaction

Holo2 Models and Capabilities

AI Daily Digest