Scaling multi-robot coordination in dynamic, real-world environments has been a persistent challenge. Centralized approaches founder under the computational burden of combined observations as team size grows, while decentralized methods often necessitate complex inference-time communication or explicit alignment procedures to overcome partial observability. This research introduces a paradigm shift.
Related startups
Decentralized Collaboration via Vision-Language Priors
The core innovation lies in harnessing the visuomotor priors of pretrained Vision-Language-Action (VLA) models to enable reactive, decentralized multi-robot collaboration. The proposed CHORUS framework adapts a single VLA backbone to control diverse multi-robot teams. Critically, at inference, each robot operates independently, relying solely on its local observations and a robot-identifying prompt, eliminating the need for inter-robot communication or complex inference-time synchronization.
Empirical Validation Across Diverse Tasks
Real-world experiments demonstrate CHORUS's efficacy across challenging tasks, including mobile tape measurement, library book handovers, and laundry basket lifting. The framework achieved a substantial 64% point improvement over decentralized, from-scratch models. Furthermore, CHORUS demonstrated a 40% point increase in reactivity to teammate behavior, outperforming even centralized baselines. These results underscore the power of shared VLA backbones for achieving robust, decentralized multi-robot collaboration without per-robot policies or inference-time communication.