OpenSearch Democratizes Frontier LLM Search

Fact-Grounded Synthesis for Scalable Reasoning

OpenSeeker's core innovation lies in its ability to generate complex, multi-hop reasoning tasks at scale. By reverse-engineering the web graph through topological expansion and entity obfuscation, the system synthesizes fact-grounded, scalable, and controllable Question Answering (QA) data. This approach allows for precise control over task coverage and complexity, directly tackling the data scarcity issue that has plagued the research community.

Denoising Trajectories for Enhanced Action Quality

Complementing the data synthesis, OpenSeeker employs a denoised trajectory synthesis mechanism. This technique utilizes a retrospective summarization approach to refine the action sequences generated by teacher LLMs. By effectively 'denoising' these trajectories, the system significantly enhances the quality of actions produced, leading to more efficient and effective search operations.

Frontier Performance with Open Access

The impact of these innovations is demonstrably clear in performance metrics. Trained on a remarkably small dataset of just 11.7k synthesized samples, the OpenSeeker search agent achieves state-of-the-art results across multiple benchmarks, including BrowseComp, BrowseComp-ZH, xbench-DeepSearch, and WideSearch. Notably, it outperforms other open-source agents like DeepDive by a substantial margin (29.5% vs. 15.3% on BrowseComp) and even surpasses industrial competitors such as Tongyi DeepResearch on BrowseComp-ZH (48.4% vs. 46.7%). The full open-sourcing of the training dataset and model weights signals a pivotal shift towards a more transparent and collaborative research ecosystem.

OpenSearch Democratizes Frontier LLM Search

Fact-Grounded Synthesis for Scalable Reasoning

Related startups

Denoising Trajectories for Enhanced Action Quality

Frontier Performance with Open Access

AI Daily Digest