The quest for broadly capable agentic language models is hampered by a lack of transparent and effective data curation methodologies. Existing efforts often focus on single benchmarks, failing to equip models with the generalization needed for diverse real-world applications. The OpenThoughts-Agent (OT-Agent) project tackles this critical gap with a fully open data curation pipeline.
Related startups
Systematic Ablation Unlocks Key Data Insights
Through over 100 controlled ablation experiments, the researchers meticulously dissected their data pipeline. This rigorous approach yielded crucial insights into the importance of task sources and diversity, directly informing the construction of their curated training set. This systematic investigation is a departure from previous, less granular approaches to agentic model training data.
OT-Agent Data Outperforms and Scales
The project assembled a 100K-example training set using their pipeline and fine-tuned Qwen3-32B. The resulting model achieved an average accuracy of 44.8% across seven agentic benchmarks, a notable 3.9 percentage point improvement over the strongest existing open data agentic model, Nemotron-Terminal-32B (40.9%). Crucially, the training data exhibits strong scaling properties, outperforming alternative open datasets across various training set sizes in compute-controlled comparisons. This suggests the OT-Agent pipeline is a more efficient and effective path to developing capable agentic language models.