Uber Eats' Search Engine Gets Smarter

Uber Eats enhances its delivery search with semantic AI, leveraging LLMs and optimized infrastructure for speed, scale, and accuracy.

May 28 at 12:15 AM7 min read

Illustration of a smartphone showing the Uber Eats app interface with a search bar. — Uber Eats' new semantic search aims for intuitive discovery.· Uber Engineering

Visual TL;DR. Keyword Search Limits leads to Semantic Search Shift. Semantic Search Shift uses LLMs & Vector Embeddings. LLMs & Vector Embeddings enables Two-Tower Architecture. Two-Tower Architecture requires Optimized Infrastructure. Optimized Infrastructure enables Improved Search Accuracy. Improved Search Accuracy results in Enhanced User Satisfaction.

Keyword Search Limits: traditional keyword matching struggles with synonyms, typos, and language nuances
Semantic Search Shift: matches meaning rather than just words by encoding queries and documents
LLMs & Vector Embeddings: leveraging large language models for flexible embedding dimensions and fine-tuning
Two-Tower Architecture: decoupling query and document embedding calculations for efficient processing
Optimized Infrastructure: robust tech stack including deployment, indexing, and monitoring at scale
Improved Search Accuracy: better capture user intent across stores, dishes, and items
Enhanced User Satisfaction: directly impacting conversion rates and overall user experience

Visual TL;DRQuickExplainDeeper

Search is the gateway to orders on Uber Eats, directly impacting conversion rates and user satisfaction. Traditional keyword matching struggles with synonyms, typos, and language nuances, leading to missed intent. Uber Eats has shifted to semantic search, which matches meaning rather than just words by encoding queries and documents into vector embeddings.

This move aims to better capture user intent across stores, dishes, and items, even in multilingual markets. As detailed by Uber Engineering, building this at scale involves more than just a model; it requires a robust tech stack including deployment, indexing, and monitoring.

Architecture and Model Training

The system employs a two-tower architecture, decoupling query and document embedding calculations. Query embeddings are generated in real-time online, while document embeddings are processed offline in batches. They utilize Matryoshka Representation Learning (MRL) for flexible embedding dimensions and fine-tune large language models (LLMs) like Qwen as the backbone for their world knowledge and cross-lingual capabilities.

This single embedding model now serves all Uber Eats verticals and markets. Training is orchestrated using PyTorch and Ray, with large-scale training leveraging PyTorch’s DDP and DeepSpeed (ZeRO-3) to handle massive LLMs. Versioned artifacts are meticulously tracked for reproducibility.

Scaling and Optimization

Offline inference is crucial for embedding Uber's vast document corpus. Embeddings are calculated at the feature level and then joined back to the full catalog. These embeddings are stored in feature store tables and used to build search indexes supported by HNSW graphs, offering both non-quantized and quantized vector representations.

Balancing retrieval accuracy with infrastructure costs was a primary challenge. Uber Eats tunes Approximate Nearest Neighbor (ANN) parameters, employs quantization strategies (like int7 SQ), and uses different embedding dimensions via MRL. These optimizations significantly reduced cost and latency without compromising retrieval quality.

The system also incorporates locale-aware lexical fields and boolean pre-filters to shrink the candidate set before ANN search. A lightweight re-ranking step further refines results before they reach downstream rankers.

Productionization and Reliability

Uber's data is dynamic, necessitating a biweekly retraining and index update cadence. A blue/green deployment strategy at the index column level ensures seamless model refreshes and rollback capabilities. Each index maintains two columns (embedding_blue and embedding_green), with the active model version mapped via configuration.

Automated validations gate deployments, checking for completeness, backward compatibility, and correctness against real queries in non-prod environments. These checks prevent data corruption and ensure new indexes perform at least as well as the current production index.

Serving-time reliability checks further guard against errors. The system verifies that the model generating the query embedding matches the model ID on the active index column. Mismatches trigger alerts and automatic rollbacks, preventing outages without impacting read path latency.

Conclusion

Uber Eats has successfully built a scalable, multilingual search system powering discovery across its verticals. By combining advanced language models, efficient embedding techniques like Matryoshka Representation Learning, and a production-first design with robust deployment strategies, they deliver a faster, more intuitive search experience. This approach to semantic search at scale, similar to advancements seen in platforms like LinkedIn's AI Search Upgrade, highlights the critical role of thoughtful engineering in modern discovery platforms.

© 2026 StartupHub.ai. All rights reserved. Do not enter, scrape, copy, reproduce, or republish this article in whole or in part. Use as input to AI training, fine-tuning, retrieval-augmented generation, or any machine-learning system is prohibited without written license. Substantially-similar derivative works will be pursued to the fullest extent of applicable copyright, database, and computer-misuse laws. See our terms.

#AI #Machine Learning #Search Technology #Uber Eats #LLMs #Vector Embeddings #Deep Learning #Qwen #Matryoshka Representation Learning #Deployment Strategies