Existing language-conditioned robot navigation methods falter when target locations are occluded, a common scenario in dynamic environments. These systems typically ground instructions in 2D image space, limiting their perception to visible pixels. The researchers behind BEACON propose a novel approach to overcome this fundamental limitation.
Bridging Vision-Language to Bird's-Eye View
BEACON re-imagines robot navigation by predicting an ego-centric Bird's-Eye View (BEV) affordance heatmap. This strategy inherently includes occluded areas within a bounded local region, a critical departure from image-space reasoning. By injecting spatial cues into a Vision-Language Model (VLM) and fusing its output with depth-derived BEV features, BEACON generates a more comprehensive spatial understanding.
Occlusion-Aware Navigation Performance Boost
The efficacy of BEACON is demonstrated through rigorous testing on an occlusion-aware dataset built in the Habitat simulator. The results show a substantial improvement, with BEACON achieving 22.74 percentage points higher accuracy on average across geodesic thresholds compared to state-of-the-art image-space baselines specifically on validation subsets with occluded target locations. This advancement signifies a leap forward for BEACON robot navigation in complex, real-world scenarios.


