Uber Fights Bounding Box Errors

Uber Engineering uses machine learning to automatically detect and correct bounding box annotation errors in video data, boosting ML model training quality.

7 min read
Illustration of bounding boxes around objects in a video frame.
An example of bounding box annotations used in object detection.· Uber Engineering

Training custom machine learning models for specific business needs demands high-quality data, often sourced from human annotations. However, these annotations, particularly for video, are prone to errors. Uber Engineering has developed an ML-based system to tackle these bounding box annotation errors, aiming to ensure data integrity before it feeds into model training.

Visual TL;DR. Manual Annotation Errors leads to Costly & Inconsistent. Manual Annotation Errors solves Uber's ML Solution. Costly & Inconsistent motivates Uber's ML Solution. Uber's ML Solution uses uLabel Integration. Uber's ML Solution addresses Tricky Video Segments. Uber's ML Solution enhances Synthetic Data. Uber's ML Solution enables Accurate ML Training. Accurate ML Training leads to Boosted Model Quality.

  1. Manual Annotation Errors: human annotators make mistakes in video bounding box labeling
  2. Costly & Inconsistent: manual review doubles cost and time, lacks consistency
  3. Uber's ML Solution: ML system detects and corrects bounding box errors automatically
  4. uLabel Integration: solution integrated into in-house annotation tool uLabel
  5. Tricky Video Segments: challenges arise from rejoining video segments after annotation
  6. Synthetic Data: using synthetic data for robustness in error detection
  7. Accurate ML Training: ensures data integrity for higher quality ML models
  8. Boosted Model Quality: improved performance and reliability of trained ML models
Visual TL;DR
Visual TL;DR — startuphub.ai Manual Annotation Errors solves Uber's ML Solution. Uber's ML Solution enables Accurate ML Training. Accurate ML Training leads to Boosted Model Quality solves enables Manual Annotation Errors Uber's ML Solution Accurate ML Training Boosted Model Quality From startuphub.ai · The publishers behind this format
Visual TL;DR — startuphub.ai Manual Annotation Errors solves Uber's ML Solution. Uber's ML Solution enables Accurate ML Training. Accurate ML Training leads to Boosted Model Quality solves enables Manual AnnotationErrors Uber's MLSolution Accurate MLTraining Boosted ModelQuality From startuphub.ai · The publishers behind this format
Visual TL;DR — startuphub.ai Manual Annotation Errors solves Uber's ML Solution. Uber's ML Solution enables Accurate ML Training. Accurate ML Training leads to Boosted Model Quality solves enables Manual Annotation Errors human annotators make mistakes in videobounding box labeling Uber's ML Solution ML system detects and corrects boundingbox errors automatically Accurate ML Training ensures data integrity for higher qualityML models Boosted Model Quality improved performance and reliability oftrained ML models From startuphub.ai · The publishers behind this format
Visual TL;DR — startuphub.ai Manual Annotation Errors solves Uber's ML Solution. Uber's ML Solution enables Accurate ML Training. Accurate ML Training leads to Boosted Model Quality solves enables Manual AnnotationErrors human annotatorsmake mistakes invideo bounding box… Uber's MLSolution ML system detectsand correctsbounding box errors… Accurate MLTraining ensures dataintegrity forhigher quality ML… Boosted ModelQuality improvedperformance andreliability of… From startuphub.ai · The publishers behind this format
Visual TL;DR — startuphub.ai Manual Annotation Errors leads to Costly & Inconsistent. Manual Annotation Errors solves Uber's ML Solution. Costly & Inconsistent motivates Uber's ML Solution. Uber's ML Solution uses uLabel Integration. Uber's ML Solution addresses Tricky Video Segments. Uber's ML Solution enhances Synthetic Data. Uber's ML Solution enables Accurate ML Training. Accurate ML Training leads to Boosted Model Quality solves motivates uses addresses enhances enables Manual Annotation Errors human annotators make mistakes in videobounding box labeling Costly & Inconsistent manual review doubles cost and time, lacksconsistency Uber's ML Solution ML system detects and corrects boundingbox errors automatically uLabel Integration solution integrated into in-houseannotation tool uLabel Tricky Video Segments challenges arise from rejoining videosegments after annotation Synthetic Data using synthetic data for robustness inerror detection Accurate ML Training ensures data integrity for higher qualityML models Boosted Model Quality improved performance and reliability oftrained ML models From startuphub.ai · The publishers behind this format
Visual TL;DR — startuphub.ai Manual Annotation Errors leads to Costly & Inconsistent. Manual Annotation Errors solves Uber's ML Solution. Costly & Inconsistent motivates Uber's ML Solution. Uber's ML Solution uses uLabel Integration. Uber's ML Solution addresses Tricky Video Segments. Uber's ML Solution enhances Synthetic Data. Uber's ML Solution enables Accurate ML Training. Accurate ML Training leads to Boosted Model Quality solves motivates uses addresses enhances enables Manual AnnotationErrors human annotatorsmake mistakes invideo bounding box… Costly &Inconsistent manual reviewdoubles cost andtime, lacks… Uber's MLSolution ML system detectsand correctsbounding box errors… uLabelIntegration solution integratedinto in-houseannotation tool… Tricky VideoSegments challenges arisefrom rejoiningvideo segments… Synthetic Data using syntheticdata for robustnessin error detection Accurate MLTraining ensures dataintegrity forhigher quality ML… Boosted ModelQuality improvedperformance andreliability of… From startuphub.ai · The publishers behind this format

The challenge lies in video annotation, where long footage is split into segments for operators, creating opportunities for mistakes during the rejoining process. Traditional human review workflows are costly and inconsistent. Uber's solution, integrated into their in-house tool uLabel, offers real-time, automated validation.

The Problem with Manual Review

Human annotators can make mistakes. A second pair of eyes helps, but it doubles cost and time. This sequential process is inefficient for large-scale projects.

Related startups

Uber's ML-Powered Solution

Uber's system automatically detects critical annotation errors like ID swaps (a tracker mistakenly following the wrong object) and position jumps (unexplained shifts in coordinates). These are the most common and impactful failures, according to the Uber Engineering blog.

Why It's Tricky

Detecting these errors isn't straightforward. What looks like an error in one context might be normal in another. Object size, motion, camera movement, scene complexity, and even frame rate all influence what constitutes an anomaly. A 10-pixel shift is negligible for a car but significant for a distant pedestrian.

Fixed rules like "flag any jump greater than X pixels" are insufficient because they can't adapt to varying conditions.

Architecture for Accuracy

The validation pipeline uses an 11-frame sliding window to analyze features across visual, motion, and coordinate data. An XGBoost classifier then scores each frame for error probability.

This approach processes raw video and annotations, extracts features, classifies potential errors, and clusters them into actionable groups for human review.

Synthetic Data for Robustness

Since real-world errors are rare, Uber generates synthetic data by introducing perturbations that mimic human mistakes. This includes simulating ID swaps and position jumps across various magnitudes and distances.

This synthetic dataset, derived from six open-source datasets, ensures the system generalizes across diverse scenarios, from autonomous driving to crowded scenes. This focus on improving the quality of machine learning data labeling is critical.

In-Tool Validation and Future Plans

The system flags issues directly in uLabel, allowing operators to correct them or dismiss the suggestion. Uber is already deploying this solution across its bounding box annotation projects and plans to expand it to cover more error types, further enhancing video annotation quality.

This automated validation significantly improves data quality and streamlines workflows, contributing to more robust machine learning and robotics systems.

© 2026 StartupHub.ai. All rights reserved. Do not enter, scrape, copy, reproduce, or republish this article in whole or in part. Use as input to AI training, fine-tuning, retrieval-augmented generation, or any machine-learning system is prohibited without written license. Substantially-similar derivative works will be pursued to the fullest extent of applicable copyright, database, and computer-misuse laws. See our terms.