Chroma's Context-1: Faster, Cheaper AI Search

Retrieval-augmented generation (RAG) systems often struggle with queries that require piecing together information from multiple documents or involve complex reasoning. Traditional single-pass pipelines fail when information is scattered, unlike the iterative, multi-hop approach needed for many real-world questions. This is where agentic search, a loop of Large Language Model (LLM) calls with search tools, has shown promise. However, the cost and latency associated with using frontier-scale LLMs for this task are significant.

Enter Chroma Context-1, a 20B parameter model derived from gpt-oss-20B. Chroma claims Context-1 achieves retrieval performance on par with frontier LLMs but at a substantially lower cost and up to 10x faster inference speed. This efficiency is crucial for enabling sophisticated multi-hop retrieval and advancing systems like those explored in advanced retrieval augmented generation.

A Self-Editing Search Agent

Context-1 is designed to function as a specialized subagent within a larger AI system. Its primary role is to produce a ranked list of documents relevant to a given query, rather than generating an answer itself. This separation of concerns—search versus generation—is key to its architecture.

The model is trained to break down complex queries into smaller subqueries. It then iteratively searches a corpus, crucially, selectively editing its own context to discard irrelevant information. This self-editing capability is vital for managing context window bloat, a common bottleneck that increases computational cost and degrades performance in long-horizon search tasks.

Efficiency Through Specialized Training

The high cost and latency of current agentic search methods are often driven by the expanding context windows of frontier models. As agents gather more information over multiple turns, their context fills with tangential or redundant data. Context-1 tackles this by actively managing its context, retaining only the most relevant retrieved information to free up capacity for further exploration.

Chroma trained Context-1 on over eight thousand synthetically generated tasks. The training curriculum employed a staged approach, initially optimizing for recall before shifting focus to precision. This method trains the agent to progressively narrow its search from broad retrieval to selective retention.

Scalable Data Generation and Performance

Generating high-quality training data for agentic search has been a significant challenge. Chroma developed a scalable synthetic task generation pipeline that leverages an LLM judge to minimize the need for human annotation while maintaining task quality. This pipeline can generate multi-constraint questions across various domains, including web, finance, legal, and email.

The system aims to address the limitations of single-shot retrieval by enabling agents to reformulate queries based on intermediate results and make decisions about exploration versus exploitation. This approach models search as a sequential reasoning task, where the relevance of each step depends on what has been discovered previously.

Results show that a purpose-trained 20B parameter model like Context-1 can indeed reach the Pareto frontier for retrieval performance concerning cost and latency. It matches or surpasses significantly larger frontier models, offering a compelling solution for efficient, scalable agentic search.

Chroma's Context-1: Faster, Cheaper AI Search

A Self-Editing Search Agent

Efficiency Through Specialized Training

Scalable Data Generation and Performance

AI Daily Digest