Databricks is rolling out a significant performance upgrade for its Agent Bricks Knowledge Assistant, slashing search times by more than 3x. This boost, which also halves answer generation time, brings the time to first token down to approximately two seconds. The enhancements are powered by a new model called Instructed-Retriever-1, detailed in a Databricks blog post, which leverages a technique called parallel test-time scaling.
Traditional agentic search systems often process results sequentially, leading to higher latency. Instructed-Retriever-1 flips this by parallelizing the initial search phase. This allows for broader evidence retrieval and more precise context selection upfront, dramatically cutting down response times.