Perplexity AI today launched the Deep Research Accuracy, Completeness, and Objectivity (DRACO) Benchmark, an open-source tool designed to evaluate AI agents based on how users actually conduct complex research. The move aims to bridge the gap between AI models excelling at synthetic tasks and those capable of serving authentic user needs.
DRACO is model-agnostic, meaning it can be rerun as more powerful AI agents emerge, with Perplexity committing to publishing updated results. This also allows users to see performance gains directly reflected in products like Perplexity.ai's Deep Research feature.
Evaluating AI's Research Prowess
Perplexity AI touts its Deep Research capabilities as state-of-the-art, citing strong performance on existing benchmarks like Google DeepMind's DeepSearchQA and Scale AI's ResearchRubrics. This performance is attributed to a combination of leading AI models and Perplexity's proprietary tools, including search infrastructure and code execution capabilities.
However, the company realized that existing benchmarks often fail to capture the nuanced demands of real-world research. DRACO was developed from millions of production tasks across ten domains, including Academic, Finance, Law, Medicine, and Technology, to address this limitation.
Unlike benchmarks that test narrow skills, DRACO focuses on synthesis across sources, nuanced analysis, and actionable guidance, all while prioritizing accuracy and proper citation. The benchmark features 100 curated tasks, each with detailed evaluation rubrics developed by subject matter experts.
