BioMiner: Unlocking Drug Discovery Data

Deconstructing Bioactivity: Semantic Interpretation Meets Structure Resolution

The core innovation lies in BioMiner's explicit separation of bioactivity semantic interpretation from ligand structure construction. This multi-modal extraction framework employs direct reasoning for semantic understanding and a novel chemical-structure-grounded visual reasoning paradigm for inferring inter-structure relationships. Importantly, exact molecular construction is offloaded to specialized domain chemistry tools, streamlining the process. This approach, detailed in a recent arXiv preprint, tackles the dual challenge of understanding complex biological interactions and accurately representing the chemical entities involved.

BioVista Benchmark and Quantifiable Performance Gains

To rigorously evaluate and advance automated extraction, the authors introduce BioVista, a comprehensive benchmark comprising 16,457 bioactivity entries from 500 publications. BioMiner demonstrates its efficacy on this benchmark, achieving an F1 score of 0.32 for bioactivity triplets. This quantitative baseline underscores the system's extraction capabilities. The practical impact is further highlighted by its application in building a pre-training database from 11,683 papers, which improved downstream model performance by 3.9%.

Accelerating Discovery and Identifying Novel Therapeutics

BioMiner's real-world utility is evident in its successful application across three key areas. First, it fuels large-scale data aggregation for pre-training, enhancing subsequent AI models. Second, integrated into a human-in-the-loop workflow, it has doubled the yield of high-quality NLRP3 bioactivity data, leading to a 38.6% improvement over 28 QSAR models and the identification of 16 hit candidates with novel scaffolds. Finally, in annotating protein-ligand complex bioactivity for the PoseBusters dataset, BioMiner achieved a 5.59-fold speed increase and a 5.75% accuracy improvement compared to manual workflows, showcasing its potential to dramatically accelerate the drug discovery pipeline.

BioMiner: Unlocking Drug Discovery Data

Deconstructing Bioactivity: Semantic Interpretation Meets Structure Resolution

Related startups

BioVista Benchmark and Quantifiable Performance Gains

Accelerating Discovery and Identifying Novel Therapeutics

AI Daily Digest