Retrieval-augmented generation (RAG) systems often struggle with queries that require piecing together information from multiple documents or involve complex reasoning. Traditional single-pass pipelines fail when information is scattered, unlike the iterative, multi-hop approach needed for many real-world questions. This is where agentic search, a loop of Large Language Model (LLM) calls with search tools, has shown promise. However, the cost and latency associated with using frontier-scale LLMs for this task are significant.
Enter Chroma Context-1, a 20B parameter model derived from gpt-oss-20B. Chroma claims Context-1 achieves retrieval performance on par with frontier LLMs but at a substantially lower cost and up to 10x faster inference speed. This efficiency is crucial for enabling sophisticated multi-hop retrieval and advancing systems like those explored in advanced retrieval augmented generation.
A Self-Editing Search Agent
Context-1 is designed to function as a specialized subagent within a larger AI system. Its primary role is to produce a ranked list of documents relevant to a given query, rather than generating an answer itself. This separation of concerns—search versus generation—is key to its architecture.
