OneCanvas: Unified 3D Scene Representation

OneCanvas revolutionizes 3D scene understanding in VLMs by projecting multi-view features onto a unified equirectangular canvas, enabling efficient situated reasoning and SOTA performance.

6 min read
Diagram illustrating the OneCanvas projection of 3D world points onto an equirectangular canvas.
OneCanvas aggregates patch features onto a single panoramic canvas for unified 3D scene understanding.

The pursuit of sophisticated 3D scene understanding within Vision-Language Models (VLMs) has been hampered by a trade-off: either complex, bespoke geometry encoders or substantial training investments are required. This has limited the scalability and accessibility of spatial reasoning capabilities in AI.

Visual TL;DR. VLM 3D Understanding leads to Complex Geometry/Training. Complex Geometry/Training solves OneCanvas Approach. OneCanvas Approach involves Unproject to 3D. Unproject to 3D then Map to Canvas. Map to Canvas creates Unified Spatial System. Unified Spatial System enables Situated Reasoning. Unified Spatial System enables Efficient Pretraining.

Related startups

  1. VLM 3D Understanding: sophisticated 3D scene understanding in VLMs has been hampered
  2. Complex Geometry/Training: either complex geometry encoders or substantial training investments
  3. OneCanvas Approach: projects multi-view features onto a unified equirectangular canvas
  4. Unproject to 3D: unprojected to its 3D world coordinate using depth and pose
  5. Map to Canvas: mapped to continuous longitude and latitude on the canvas
  6. Unified Spatial System: creating a shared spatial coordinate system without rasterization
  7. Situated Reasoning: enabling efficient situated reasoning and VLM performance
  8. Efficient Pretraining: enabling efficient pretraining for spatial reasoning capabilities
Visual TL;DR
Visual TL;DR — startuphub.ai VLM 3D Understanding leads to Complex Geometry/Training. Complex Geometry/Training solves OneCanvas Approach. Unified Spatial System enables Situated Reasoning solves enables VLM 3D Understanding Complex Geometry/Training OneCanvas Approach Unified Spatial System Situated Reasoning From startuphub.ai · The publishers behind this format
Visual TL;DR — startuphub.ai VLM 3D Understanding leads to Complex Geometry/Training. Complex Geometry/Training solves OneCanvas Approach. Unified Spatial System enables Situated Reasoning solves enables VLM 3DUnderstanding ComplexGeometry/Training OneCanvasApproach Unified SpatialSystem SituatedReasoning From startuphub.ai · The publishers behind this format
Visual TL;DR — startuphub.ai VLM 3D Understanding leads to Complex Geometry/Training. Complex Geometry/Training solves OneCanvas Approach. Unified Spatial System enables Situated Reasoning solves enables VLM 3D Understanding sophisticated 3D scene understanding inVLMs has been hampered Complex Geometry/Training either complex geometry encoders orsubstantial training investments OneCanvas Approach projects multi-view features onto aunified equirectangular canvas Unified Spatial System creating a shared spatial coordinatesystem without rasterization Situated Reasoning enabling efficient situated reasoning andVLM performance From startuphub.ai · The publishers behind this format
Visual TL;DR — startuphub.ai VLM 3D Understanding leads to Complex Geometry/Training. Complex Geometry/Training solves OneCanvas Approach. Unified Spatial System enables Situated Reasoning solves enables VLM 3DUnderstanding sophisticated 3Dscene understandingin VLMs has been… ComplexGeometry/Training either complexgeometry encodersor substantial… OneCanvasApproach projects multi-viewfeatures onto aunified… Unified SpatialSystem creating a sharedspatial coordinatesystem without… SituatedReasoning enabling efficientsituated reasoningand VLM performance From startuphub.ai · The publishers behind this format
Visual TL;DR — startuphub.ai VLM 3D Understanding leads to Complex Geometry/Training. Complex Geometry/Training solves OneCanvas Approach. OneCanvas Approach involves Unproject to 3D. Unproject to 3D then Map to Canvas. Map to Canvas creates Unified Spatial System. Unified Spatial System enables Situated Reasoning. Unified Spatial System enables Efficient Pretraining solves involves then creates enables enables VLM 3D Understanding sophisticated 3D scene understanding inVLMs has been hampered Complex Geometry/Training either complex geometry encoders orsubstantial training investments OneCanvas Approach projects multi-view features onto aunified equirectangular canvas Unproject to 3D unprojected to its 3D world coordinateusing depth and pose Map to Canvas mapped to continuous longitude andlatitude on the canvas Unified Spatial System creating a shared spatial coordinatesystem without rasterization Situated Reasoning enabling efficient situated reasoning andVLM performance Efficient Pretraining enabling efficient pretraining for spatialreasoning capabilities From startuphub.ai · The publishers behind this format
Visual TL;DR — startuphub.ai VLM 3D Understanding leads to Complex Geometry/Training. Complex Geometry/Training solves OneCanvas Approach. OneCanvas Approach involves Unproject to 3D. Unproject to 3D then Map to Canvas. Map to Canvas creates Unified Spatial System. Unified Spatial System enables Situated Reasoning. Unified Spatial System enables Efficient Pretraining solves involves then creates enables enables VLM 3DUnderstanding sophisticated 3Dscene understandingin VLMs has been… ComplexGeometry/Training either complexgeometry encodersor substantial… OneCanvasApproach projects multi-viewfeatures onto aunified… Unproject to 3D unprojected to its3D world coordinateusing depth and… Map to Canvas mapped tocontinuouslongitude and… Unified SpatialSystem creating a sharedspatial coordinatesystem without… SituatedReasoning enabling efficientsituated reasoningand VLM performance EfficientPretraining enabling efficientpretraining forspatial reasoning… From startuphub.ai · The publishers behind this format

The Equirectangular Canvas: A Unified Spatial Coordinate System

The OneCanvas approach, detailed by Baranowski et al. on arXiv, fundamentally rethinks how multi-view image patches are integrated into a VLM. Instead of complex fusion mechanisms, it projects patch features into a single equirectangular panoramic canvas. Each patch is unprojected to its 3D world coordinate using its depth and camera pose. Crucially, this 3D position is then mapped to continuous longitude and latitude on the canvas, effectively creating a shared spatial coordinate system without rasterization or cross-view aggregation. A 3D position embedding of the patch's metric coordinates is added to its feature, preserving depth information lost in the angular projection. This representation is directly consumable by pretrained VLMs as if it were a standard image, eliminating the need for major architectural modifications or specialized encoders.

Enabling Situated Reasoning and Efficient Pretraining

A key strategic advantage of OneCanvas is its inherent support for situated reasoning. By centering the canvas on any pose of interest, the same representation can be used to perform analysis from a specific viewpoint, a critical capability for robotics and embodied AI applications. Furthermore, this unified representation unlocks a novel spatial pretraining curriculum. Researchers can procedurally generate supervision by placing object patch features at chosen 3D world positions on an empty canvas. This on-the-fly generation allows for broad coverage of spatial reasoning tasks while controlling answer distributions to prevent shortcut learning. This methodology has demonstrated state-of-the-art accuracy on benchmarks like SQA3D and VSI-Bench, and generalizes to out-of-distribution data on SPBench, all while utilizing an order of magnitude less training compute than competing methods. This efficiency dramatically lowers the barrier to entry for advanced 3D scene understanding.

© 2026 StartupHub.ai. All rights reserved. Do not enter, scrape, copy, reproduce, or republish this article in whole or in part. Use as input to AI training, fine-tuning, retrieval-augmented generation, or any machine-learning system is prohibited without written license. Substantially-similar derivative works will be pursued to the fullest extent of applicable copyright, database, and computer-misuse laws. See our terms.