Databricks Search Gets 3x Faster

Databricks is rolling out a significant performance upgrade for its Agent Bricks Knowledge Assistant, slashing search times by more than 3x. This boost, which also halves answer generation time, brings the time to first token down to approximately two seconds. The enhancements are powered by a new model called Instructed-Retriever-1, detailed in a Databricks blog post, which leverages a technique called parallel test-time scaling.

Visual TL;DR. Slow Search Latency addressed by Instructed-Retriever-1 Model. Instructed-Retriever-1 Model uses Parallel Test-Time Scaling. Parallel Test-Time Scaling enables Broader Evidence Retrieval. Parallel Test-Time Scaling enables Precise Context Selection. Broader Evidence Retrieval leads to 3x Faster Search. Precise Context Selection leads to 3x Faster Search. 3x Faster Search and Halved Answer Generation. Halved Answer Generation results in 2-Second Time to First Token.

Slow Search Latency: Traditional agentic search systems process results sequentially, leading to higher latency
Instructed-Retriever-1 Model: New Databricks model powering the performance upgrade for Knowledge Assistant
Parallel Test-Time Scaling: Flipping sequential computation to fan out tasks in parallel during initial search
Broader Evidence Retrieval: Allows for wider retrieval of relevant information upfront in the search process
Precise Context Selection: Enables more accurate selection of context for better answer generation
3x Faster Search: Significant boost in Knowledge Assistant search speed, over three times faster
Halved Answer Generation: Answer generation time is also reduced by approximately half
2-Second Time to First Token: Achieving approximately two seconds for the initial response to be delivered

Visual TL;DRQuickExplainDeeper

Traditional agentic search systems often process results sequentially, leading to higher latency. Instructed-Retriever-1 flips this by parallelizing the initial search phase. This allows for broader evidence retrieval and more precise context selection upfront, dramatically cutting down response times.

Related startups

Parallelizing the Search Pipeline

The core innovation lies in how Databricks approaches test-time computation. Instead of spending compute sequentially on steps like tool calls or reasoning, the system fans these tasks out in parallel during the initial search. This broadens the retrieved evidence and refines it efficiently.

Instructed-Retriever-1 is a single model trained for two critical retrieval stages: query generation to enhance recall and reranking to boost precision. These run concurrently to maintain low latency.

The training harness is key, feeding the model user instructions and index schemas. This propagates through query generation, filter creation, reranking, and final answer generation.

This parallel query and filter generation explores multiple formulations of a request simultaneously. It allows for a wider search while keeping latency in check.

To manage the aggregation of results from broader searches, a multi-pivot groupwise reranker is employed. This ranks candidate chunks in parallel groups, merging them into a final, ordered list.

These stages offer two scaling knobs: increasing query formulations improves recall, while more pivots enhance precision. The system can trade additional compute for higher quality context without penalizing latency.

Training a Unified Retrieval Model

Instructed-Retriever-1 was trained as a single, retrieval-specialized model capable of both effective search generation and evidence judgment. It matches the retrieval quality of models like Claude Sonnet 4.5 on benchmarks, all while delivering low latency.

Synthetic enterprise-style retrieval environments were built for training, mirroring real-world tasks. These include factual lookups, summarization, and decision support over mixed document types.

The model is trained in two stages to support both query generation and verification-style retrieval capabilities, making parallel test-time scaling practical.

Production Validation and Performance

The effectiveness of Instructed-Retriever-1 was validated on a large internal dataset reflecting actual Knowledge Assistant usage. The evaluation confirmed that parallel query generation and multi-pivot reranking significantly improve retrieval quality.

On realistic workloads, the model showed strong performance across query generation metrics like specificity, breadth, and relevance. It also proved competitive in reranking, achieving 81.0 nDCG@10, a substantial gain over settings without reranking.

Serving performance is critical for parallel test-time scaling. Instructed-Retriever-1 employs a Mixture-of-Experts architecture and optimizations like FP8 quantization and speculative decoding for efficient inference.

These optimizations deliver significant speed-ups, with FP8 showing no quality degradation and speculative decoding adding further gains to the query-generation and reranking path.

The result is a Knowledge Assistant that is demonstrably faster and more capable. Early users, such as Baylor University, have noted the noticeable UX improvement, describing the experience as "more concise, with a 'snappy' feel that surfaces key information sooner."

Instructed-Retriever-1 is now rolling out to all Databricks customers, promising faster access to higher-quality information.

© 2026 StartupHub.ai. All rights reserved. Do not enter, scrape, copy, reproduce, or republish this article in whole or in part. Use as input to AI training, fine-tuning, retrieval-augmented generation, or any machine-learning system is prohibited without written license. Substantially-similar derivative works will be pursued to the fullest extent of applicable copyright, database, and computer-misuse laws. See our terms.

Databricks Search Gets 3x Faster

Related startups

Parallelizing the Search Pipeline

Training a Unified Retrieval Model

Production Validation and Performance

AI Daily Digest