Unified Embodied AI: Pelican-Unified 1.0

The pursuit of truly intelligent embodied agents has long been hampered by the need to train disparate, specialized models for perception, reasoning, and action. This fragmentation leads to inefficiencies and limits the holistic capabilities of AI systems. The introduction of Pelican-Unified 1.0 marks a significant departure, presenting the first embodied foundation model built on the principle of unification.

Visual TL;DR. Fragmented AI Models leads to Inefficiency & Limits. Inefficiency & Limits solves Pelican-Unified 1.0. Pelican-Unified 1.0 uses Unified VLM. Unified VLM enables Chain-of-Thought Reasoning. Chain-of-Thought Reasoning allows Simultaneous Optimization. Unified VLM leads to SOTA Performance. Chain-of-Thought Reasoning leads to SOTA Performance. SOTA Performance shows Unification Enhances.

Related startups

Fragmented AI Models: disparate specialized models for perception, reasoning, and action
Inefficiency & Limits: leads to inefficiencies and limits holistic capabilities of AI systems
Pelican-Unified 1.0: first unified embodied foundation model built on unification principle
Unified VLM: single visual-language model maps diverse inputs to shared semantic space
Chain-of-Thought Reasoning: autoregressive reasoning generates task- and action-oriented sequences in one pass
Simultaneous Optimization: backpropagation of losses into shared representation enables simultaneous optimization
SOTA Performance: achieves state-of-the-art performance by integrating perception, reasoning, generation
Unification Enhances: proving unification enhances rather than compromises specialist strengths

Visual TL;DRQuickExplainDeeper

Unifying Perception, Reasoning, and Imagination

Pelican-Unified 1.0 leverages a single Visual-Language Model (VLM) to serve as a unified understanding and reasoning module. This VLM maps diverse inputs—scenes, instructions, visual contexts, and action histories—into a shared semantic space. Crucially, it also performs autoregressive chain-of-thought reasoning, generating task- and action-oriented sequences in a single pass. This unified approach allows for the backpropagation of language, video, and action losses into the shared representation, enabling simultaneous optimization of understanding, reasoning, imagination, and action, rather than relying on isolated expert systems.

Specialist Strength Without Compromise

Contrary to the intuition that unification might lead to diluted capabilities, Pelican-Unified 1.0 demonstrates that this paradigm can preserve and even enhance specialist performance. A single checkpoint of the model achieved impressive results across multiple domains: 64.7 on eight VLM benchmarks (outperforming comparable-scale models), a first-place ranking of 66.03 on WorldArena, and 93.5 on RoboTwin (second-best among action methods). These findings underscore the efficacy of the unified approach in consolidating complex AI capabilities without sacrificing individual performance.

Unified Embodied AI: Pelican-Unified 1.0

Related startups

Unifying Perception, Reasoning, and Imagination

Specialist Strength Without Compromise

AI Daily Digest