The race to identify novel drug candidates is increasingly leveraging large language models (LLMs) with web-searching capabilities to navigate complex pharmaceutical pipelines. However, for niche areas like oncology and immunology, where critical assets reside in the long tail of preclinical and Asian-developed projects, generic web access proves insufficient. A new benchmark from Łukasz Kidziński and Kevin Thomas, detailed on arXiv, reveals a significant performance gap.
Specialized Indexes Trump General Search for Niche Discovery
The research introduces Gosset, an AI platform featuring a chat interface powered by a meticulously curated index of drug-target, modality, and indication data. When benchmarked against leading frontier systems—Claude Opus 4.7, GPT 5.5, Gemini 3.1 Pro, and Perplexity sonar-pro—on ten challenging oncology/immunology targets, Gosset demonstrated a 3.2x improvement in verified drugs identified per query compared to the best frontier system. Crucially, Gosset achieved perfect precision and 100% recall against the combined verified drug set from all systems, highlighting the limitations of broad web scraping for specialized R&D intelligence.