Scaling multi-robot coordination in dynamic, real-world environments has been a persistent challenge. Centralized approaches founder under the computational burden of combined observations as team size grows, while decentralized methods often necessitate complex inference-time communication or explicit alignment procedures to overcome partial observability. This research introduces a paradigm shift.
Decentralized Collaboration via Vision-Language Priors
The core innovation lies in harnessing the visuomotor priors of pretrained Vision-Language-Action (VLA) models to enable reactive, decentralized multi-robot collaboration. The proposed CHORUS framework adapts a single VLA backbone to control diverse multi-robot teams. Critically, at inference, each robot operates independently, relying solely on its local observations and a robot-identifying prompt, eliminating the need for inter-robot communication or complex inference-time synchronization.