Tevogen Bio's AI Slashes Drug Discovery Time

Tevogen Bio drastically cut drug discovery timelines from months to 24 hours using AI and a Databricks Lakehouse platform, aiming for more affordable therapies.

3 min read
Tevogen Bio's AI Slashes Drug Discovery Time

The notoriously slow and expensive process of drug discovery, which can cost billions and take over a decade, is getting a significant AI-powered acceleration thanks to Tevogen Bio. The company is leveraging its proprietary ExacTcell platform and PredicTcell AI models to streamline development, aiming for faster, cheaper, and more accessible therapies.

Traditionally, identifying drug targets involves lengthy manual wet-lab testing and dealing with massive, siloed datasets. Tevogen Bio partnered with Microsoft and Databricks to build a unified, governed data platform on a Databricks Lakehouse Medallion Architecture. This move tackles the challenge of processing multi-terabyte datasets, a significant bottleneck in biopharmaceutical research.

From Months to Hours

The core innovation lies in transforming a process that once took 18-24 months for initial target selection into a matter of days, and now hours. Tevogen Bio’s goal was to ingest and organize a vast library of protein sequences across various diseases to train its foundational AI models.

This endeavor involved curating a dataset of known genetic proteins and training algorithmic models to predict immunologically active peptides. The sheer scale of data, reaching multi-terabytes, required robust data pipelines for procurement, organization, and multi-level cleansing.

Databricks Powers the Breakthrough

By implementing the Medallion Architecture and Unity Catalog on the Databricks platform, Tevogen Bio successfully structured its data into bronze, silver, and gold layers, ensuring strict governance and access control. This architecture, combined with distributed computing, reduced processing time from 50 days to just 24 hours.

With the assistance of Databricks' Professional Services, Tevogen Bio processed 24 million proteins, refining them into 16 billion data points and approximately 700 million unique peptides. This rapid data processing is crucial for accelerating drug discovery with AI.

Training the Next Generation of Therapies

The result is the alpha version of the PredicTcell model, trained using XGBoost and ESM models. This model achieved an impressive 93-97% recall and 38-43% accuracy. The team is continuously enhancing the training set, incorporating expert articles via RAG integration and biochemical properties, further refining prediction capabilities.

Tevogen Bio is now training the beta version of PredicTcell and developing the alpha version of its AdapTcell model. The company is confident in its ability to create predictive models for peptide-to-protein binding affinity, a key step towards truly personalized and effective medicine.

"Adding determinism to a probabilistic workflow is the key to unlocking success. Balancing the in-vivo/in-silico trial-and-error process is something that every biotech company should be focused on for drug development," stated Mittul Mehta, CIO – Tevogen and Head – Tevogen.AI.

The partnership with Databricks and Microsoft is central to Tevogen’s mission of delivering affordable and accessible therapies, with a continued focus on AI-driven innovation in drug development.

© 2026 StartupHub.ai. All rights reserved. Do not enter, scrape, copy, reproduce, or republish this article in whole or in part. Use as input to AI training, fine-tuning, retrieval-augmented generation, or any machine-learning system is prohibited without written license. Substantially-similar derivative works will be pursued to the fullest extent of applicable copyright, database, and computer-misuse laws. See our terms.