Beyond RGB: Grounding Vision-Language on Raw Sensor Data

PRISM-VL advances vision-language models by grounding them in raw camera measurements, not just RGB, significantly improving performance on challenging visual tasks.

6 min read
Diagram illustrating PRISM-VL framework with RAW sensor data input and RGB proxy.
PRISM-VL architecture showing the integration of raw sensor measurements.

Vision-language models (VLMs) typically operate on post-image signal processing (ISP) RGB images. This preprocessing pipeline often discards crucial sensor evidence through clipping, suppression, or quantization, thereby limiting the model's ability to accurately ground its understanding. A new approach, PRISM-VL, investigates whether grounding performance improves when the visual interface is moved closer to the original camera measurement.

Visual TL;DR. VLMs use RGB leads to RGB loses data. RGB loses data problem PRISM-VL approach. PRISM-VL approach leads to RAW-derived Meas.-XYZ. PRISM-VL approach leads to Camera-conditioned grounding. Camera-conditioned grounding leads to Exposure-Bracketed Supervision. PRISM-VL approach enables Improved performance. Improved performance leads to Quantifiable gains.

Related startups

  1. VLMs use RGB: standard vision-language models process post-image signal processing RGB images
  2. RGB loses data: preprocessing discards crucial sensor evidence through clipping, suppression, or quantization
  3. PRISM-VL approach: grounds vision-language models in raw camera measurements, not just RGB
  4. RAW-derived Meas.-XYZ: directly incorporates raw sensor data inputs for improved grounding
  5. Camera-conditioned grounding: a key innovation for better understanding of sensor data
  6. Exposure-Bracketed Supervision: transfers supervision from RGB proxies to raw measurement domain observations
  7. Improved performance: significantly improves performance on challenging visual tasks
  8. Quantifiable gains: demonstrates measurable improvements in challenging scenarios
Visual TL;DR
Visual TL;DR — startuphub.ai VLMs use RGB leads to RGB loses data. RGB loses data problem PRISM-VL approach. PRISM-VL approach leads to Camera-conditioned grounding. PRISM-VL approach enables Improved performance problem enables VLMs use RGB RGB loses data PRISM-VL approach Camera-conditioned grounding Improved performance From startuphub.ai · The publishers behind this format
Visual TL;DR — startuphub.ai VLMs use RGB leads to RGB loses data. RGB loses data problem PRISM-VL approach. PRISM-VL approach leads to Camera-conditioned grounding. PRISM-VL approach enables Improved performance problem enables VLMs use RGB RGB loses data PRISM-VL approach Camera-conditionedgrounding Improvedperformance From startuphub.ai · The publishers behind this format
Visual TL;DR — startuphub.ai VLMs use RGB leads to RGB loses data. RGB loses data problem PRISM-VL approach. PRISM-VL approach leads to Camera-conditioned grounding. PRISM-VL approach enables Improved performance problem enables VLMs use RGB standard vision-language models processpost-image signal processing RGB images RGB loses data preprocessing discards crucial sensorevidence through clipping, suppression, orquantization PRISM-VL approach grounds vision-language models in rawcamera measurements, not just RGB Camera-conditioned grounding a key innovation for better understandingof sensor data Improved performance significantly improves performance onchallenging visual tasks From startuphub.ai · The publishers behind this format
Visual TL;DR — startuphub.ai VLMs use RGB leads to RGB loses data. RGB loses data problem PRISM-VL approach. PRISM-VL approach leads to Camera-conditioned grounding. PRISM-VL approach enables Improved performance problem enables VLMs use RGB standardvision-languagemodels process… RGB loses data preprocessingdiscards crucialsensor evidence… PRISM-VL approach groundsvision-languagemodels in raw… Camera-conditionedgrounding a key innovationfor betterunderstanding of… Improvedperformance significantlyimprovesperformance on… From startuphub.ai · The publishers behind this format
Visual TL;DR — startuphub.ai VLMs use RGB leads to RGB loses data. RGB loses data problem PRISM-VL approach. PRISM-VL approach leads to RAW-derived Meas.-XYZ. PRISM-VL approach leads to Camera-conditioned grounding. Camera-conditioned grounding leads to Exposure-Bracketed Supervision. PRISM-VL approach enables Improved performance. Improved performance leads to Quantifiable gains problem enables VLMs use RGB standard vision-language models processpost-image signal processing RGB images RGB loses data preprocessing discards crucial sensorevidence through clipping, suppression, orquantization PRISM-VL approach grounds vision-language models in rawcamera measurements, not just RGB RAW-derived Meas.-XYZ directly incorporates raw sensor datainputs for improved grounding Camera-conditioned grounding a key innovation for better understandingof sensor data Exposure-Bracketed Supervision transfers supervision from RGB proxies toraw measurement domain observations Improved performance significantly improves performance onchallenging visual tasks Quantifiable gains demonstrates measurable improvements inchallenging scenarios From startuphub.ai · The publishers behind this format
Visual TL;DR — startuphub.ai VLMs use RGB leads to RGB loses data. RGB loses data problem PRISM-VL approach. PRISM-VL approach leads to RAW-derived Meas.-XYZ. PRISM-VL approach leads to Camera-conditioned grounding. Camera-conditioned grounding leads to Exposure-Bracketed Supervision. PRISM-VL approach enables Improved performance. Improved performance leads to Quantifiable gains problem enables VLMs use RGB standardvision-languagemodels process… RGB loses data preprocessingdiscards crucialsensor evidence… PRISM-VL approach groundsvision-languagemodels in raw… RAW-derivedMeas.-XYZ directlyincorporates rawsensor data inputs… Camera-conditionedgrounding a key innovationfor betterunderstanding of… Exposure-BracketedSupervision transferssupervision fromRGB proxies to raw… Improvedperformance significantlyimprovesperformance on… Quantifiablegains demonstratesmeasurableimprovements in… From startuphub.ai · The publishers behind this format

Bridging the Measurement-to-RGB Gap

The researchers introduce measurement-grounded vision-language learning, instantiated as PRISM-VL. This framework directly incorporates RAW-derived Meas.-XYZ inputs. A key innovation is its camera-conditioned grounding mechanism and Exposure-Bracketed Supervision Aggregation. This technique effectively transfers supervision signals from readily available RGB proxies to the more granular, raw measurement-domain observations, addressing a fundamental challenge in training on sensor data.

Quantifiable Gains in Challenging Scenarios

PRISM-VL-8B, trained on a 150K instruction-tuning set and evaluated on a benchmark targeting low-light, HDR, visibility-sensitive, and hallucination-sensitive cases, achieved significant improvements. It reached 0.6120 BLEU and 0.4571 ROUGE-L scores, alongside an 82.66% LLM-Judge accuracy. This represents a substantial leap over the RGB-based Qwen3-VL-8B baseline, with gains of +0.1074 BLEU, +0.1071 ROUGE-L, and +4.46 percentage points in LLM-Judge accuracy. These results strongly suggest that a portion of VLM grounding errors stems directly from information lost during standard RGB rendering, underscoring the value of preserving measurement-domain evidence for enhanced multimodal reasoning.

© 2026 StartupHub.ai. All rights reserved. Do not enter, scrape, copy, reproduce, or republish this article in whole or in part. Use as input to AI training, fine-tuning, retrieval-augmented generation, or any machine-learning system is prohibited without written license. Substantially-similar derivative works will be pursued to the fullest extent of applicable copyright, database, and computer-misuse laws. See our terms.