Databricks Search Gets 3x Faster

Databricks' Instructed-Retriever-1 model uses parallel test-time scaling to boost Knowledge Assistant search speed by over 3x.

8 min read
Databricks blog post graphic showing performance improvements for Instructed-Retriever-1.
Instructed-Retriever-1 drastically improves search latency and retrieval quality.

Databricks is rolling out a significant performance upgrade for its Agent Bricks Knowledge Assistant, slashing search times by more than 3x. This boost, which also halves answer generation time, brings the time to first token down to approximately two seconds. The enhancements are powered by a new model called Instructed-Retriever-1, detailed in a Databricks blog post, which leverages a technique called parallel test-time scaling.

Visual TL;DR. Slow Search Latency addressed by Instructed-Retriever-1 Model. Instructed-Retriever-1 Model uses Parallel Test-Time Scaling. Parallel Test-Time Scaling enables Broader Evidence Retrieval. Parallel Test-Time Scaling enables Precise Context Selection. Broader Evidence Retrieval leads to 3x Faster Search. Precise Context Selection leads to 3x Faster Search. 3x Faster Search and Halved Answer Generation. Halved Answer Generation results in 2-Second Time to First Token.

  1. Slow Search Latency: Traditional agentic search systems process results sequentially, leading to higher latency
  2. Instructed-Retriever-1 Model: New Databricks model powering the performance upgrade for Knowledge Assistant
  3. Parallel Test-Time Scaling: Flipping sequential computation to fan out tasks in parallel during initial search
  4. Broader Evidence Retrieval: Allows for wider retrieval of relevant information upfront in the search process
  5. Precise Context Selection: Enables more accurate selection of context for better answer generation
  6. 3x Faster Search: Significant boost in Knowledge Assistant search speed, over three times faster
  7. Halved Answer Generation: Answer generation time is also reduced by approximately half
  8. 2-Second Time to First Token: Achieving approximately two seconds for the initial response to be delivered
Visual TL;DR
Visual TL;DR — startuphub.ai Slow Search Latency addressed by Instructed-Retriever-1 Model. Instructed-Retriever-1 Model uses Parallel Test-Time Scaling addressed by uses Slow Search Latency Instructed-Retriever-1 Model Parallel Test-Time Scaling 3x Faster Search 2-Second Time to First Token From startuphub.ai · The publishers behind this format
Visual TL;DR — startuphub.ai Slow Search Latency addressed by Instructed-Retriever-1 Model. Instructed-Retriever-1 Model uses Parallel Test-Time Scaling addressed by uses Slow SearchLatency Instructed-Retriever-1Model ParallelTest-Time Scaling 3x Faster Search 2-Second Time toFirst Token From startuphub.ai · The publishers behind this format
Visual TL;DR — startuphub.ai Slow Search Latency addressed by Instructed-Retriever-1 Model. Instructed-Retriever-1 Model uses Parallel Test-Time Scaling addressed by uses Slow Search Latency Traditional agentic search systems processresults sequentially, leading to higherlatency Instructed-Retriever-1 Model New Databricks model powering theperformance upgrade for KnowledgeAssistant Parallel Test-Time Scaling Flipping sequential computation to fan outtasks in parallel during initial search 3x Faster Search Significant boost in Knowledge Assistantsearch speed, over three times faster 2-Second Time to First Token Achieving approximately two seconds forthe initial response to be delivered From startuphub.ai · The publishers behind this format
Visual TL;DR — startuphub.ai Slow Search Latency addressed by Instructed-Retriever-1 Model. Instructed-Retriever-1 Model uses Parallel Test-Time Scaling addressed by uses Slow SearchLatency Traditional agenticsearch systemsprocess results… Instructed-Retriever-1Model New Databricksmodel powering theperformance upgrade… ParallelTest-Time Scaling Flipping sequentialcomputation to fanout tasks in… 3x Faster Search Significant boostin KnowledgeAssistant search… 2-Second Time toFirst Token Achievingapproximately twoseconds for the… From startuphub.ai · The publishers behind this format
Visual TL;DR — startuphub.ai Slow Search Latency addressed by Instructed-Retriever-1 Model. Instructed-Retriever-1 Model uses Parallel Test-Time Scaling. Parallel Test-Time Scaling enables Broader Evidence Retrieval. Parallel Test-Time Scaling enables Precise Context Selection. Broader Evidence Retrieval leads to 3x Faster Search. Precise Context Selection leads to 3x Faster Search. 3x Faster Search and Halved Answer Generation. Halved Answer Generation results in 2-Second Time to First Token addressed by uses enables enables leads to leads to and results in Slow Search Latency Traditional agentic search systems processresults sequentially, leading to higherlatency Instructed-Retriever-1 Model New Databricks model powering theperformance upgrade for KnowledgeAssistant Parallel Test-Time Scaling Flipping sequential computation to fan outtasks in parallel during initial search Broader Evidence Retrieval Allows for wider retrieval of relevantinformation upfront in the search process Precise Context Selection Enables more accurate selection of contextfor better answer generation 3x Faster Search Significant boost in Knowledge Assistantsearch speed, over three times faster Halved Answer Generation Answer generation time is also reduced byapproximately half 2-Second Time to First Token Achieving approximately two seconds forthe initial response to be delivered From startuphub.ai · The publishers behind this format
Visual TL;DR — startuphub.ai Slow Search Latency addressed by Instructed-Retriever-1 Model. Instructed-Retriever-1 Model uses Parallel Test-Time Scaling. Parallel Test-Time Scaling enables Broader Evidence Retrieval. Parallel Test-Time Scaling enables Precise Context Selection. Broader Evidence Retrieval leads to 3x Faster Search. Precise Context Selection leads to 3x Faster Search. 3x Faster Search and Halved Answer Generation. Halved Answer Generation results in 2-Second Time to First Token addressed by uses enables enables leads to leads to and results in Slow SearchLatency Traditional agenticsearch systemsprocess results… Instructed-Retriever-1Model New Databricksmodel powering theperformance upgrade… ParallelTest-Time Scaling Flipping sequentialcomputation to fanout tasks in… Broader EvidenceRetrieval Allows for widerretrieval ofrelevant… Precise ContextSelection Enables moreaccurate selectionof context for… 3x Faster Search Significant boostin KnowledgeAssistant search… Halved AnswerGeneration Answer generationtime is alsoreduced by… 2-Second Time toFirst Token Achievingapproximately twoseconds for the… From startuphub.ai · The publishers behind this format

Traditional agentic search systems often process results sequentially, leading to higher latency. Instructed-Retriever-1 flips this by parallelizing the initial search phase. This allows for broader evidence retrieval and more precise context selection upfront, dramatically cutting down response times.

Parallelizing the Search Pipeline

The core innovation lies in how Databricks approaches test-time computation. Instead of spending compute sequentially on steps like tool calls or reasoning, the system fans these tasks out in parallel during the initial search. This broadens the retrieved evidence and refines it efficiently.

Instructed-Retriever-1 is a single model trained for two critical retrieval stages: query generation to enhance recall and reranking to boost precision. These run concurrently to maintain low latency.

Related startups

The training harness is key, feeding the model user instructions and index schemas. This propagates through query generation, filter creation, reranking, and final answer generation.

This parallel query and filter generation explores multiple formulations of a request simultaneously. It allows for a wider search while keeping latency in check.

To manage the aggregation of results from broader searches, a multi-pivot groupwise reranker is employed. This ranks candidate chunks in parallel groups, merging them into a final, ordered list.

These stages offer two scaling knobs: increasing query formulations improves recall, while more pivots enhance precision. The system can trade additional compute for higher quality context without penalizing latency.

Training a Unified Retrieval Model

Instructed-Retriever-1 was trained as a single, retrieval-specialized model capable of both effective search generation and evidence judgment. It matches the retrieval quality of models like Claude Sonnet 4.5 on benchmarks, all while delivering low latency.

Synthetic enterprise-style retrieval environments were built for training, mirroring real-world tasks. These include factual lookups, summarization, and decision support over mixed document types.

The model is trained in two stages to support both query generation and verification-style retrieval capabilities, making parallel test-time scaling practical.

Production Validation and Performance

The effectiveness of Instructed-Retriever-1 was validated on a large internal dataset reflecting actual Knowledge Assistant usage. The evaluation confirmed that parallel query generation and multi-pivot reranking significantly improve retrieval quality.

On realistic workloads, the model showed strong performance across query generation metrics like specificity, breadth, and relevance. It also proved competitive in reranking, achieving 81.0 nDCG@10, a substantial gain over settings without reranking.

Serving performance is critical for parallel test-time scaling. Instructed-Retriever-1 employs a Mixture-of-Experts architecture and optimizations like FP8 quantization and speculative decoding for efficient inference.

These optimizations deliver significant speed-ups, with FP8 showing no quality degradation and speculative decoding adding further gains to the query-generation and reranking path.

The result is a Knowledge Assistant that is demonstrably faster and more capable. Early users, such as Baylor University, have noted the noticeable UX improvement, describing the experience as "more concise, with a 'snappy' feel that surfaces key information sooner."

Instructed-Retriever-1 is now rolling out to all Databricks customers, promising faster access to higher-quality information.

© 2026 StartupHub.ai. All rights reserved. Do not enter, scrape, copy, reproduce, or republish this article in whole or in part. Use as input to AI training, fine-tuning, retrieval-augmented generation, or any machine-learning system is prohibited without written license. Substantially-similar derivative works will be pursued to the fullest extent of applicable copyright, database, and computer-misuse laws. See our terms.