Alex Rives, Head of Science at Biohub, discussed the profound impact of AI on protein biology, drawing parallels to the "bitter lesson" observed in other AI domains. In a conversation with Brandon Anderson, Staff Scientist at Atomic AI, Rives highlighted how large language models, when trained on vast datasets of protein sequences, can learn fundamental biological principles. This capability is paving the way for a new era of programmable biology, where AI can predict protein structures and functions, and even design novel proteins with desired therapeutic properties.
Related startups
The Bitter Lesson and Protein Biology
Rives explained that the core idea behind applying AI to protein biology is rooted in the concept of learning from data without explicit programming. Just as AI models have learned to excel at tasks like language translation or image recognition by processing massive amounts of data, they can similarly uncover the implicit rules governing protein folding, function, and interactions. This approach, he noted, is a testament to the "bitter lesson" – the idea that scaling computation and data often leads to more general and powerful AI capabilities than relying on handcrafted features or domain-specific heuristics.
From Language Models to Protein Models
The conversation touched upon the evolution of AI models, moving from natural language processing to biological sequences. Rives detailed how models like ESM (Evolutionary Scale Modeling) are trained to predict the next token in a sequence, a process that, when applied to proteins, allows them to learn the underlying grammar of protein biology. He showcased how these models can generate representations that capture complex biological information, such as evolutionary constraints and functional motifs. This learned representation, he argued, is akin to a "world model" of protein biology, enabling a deeper understanding and manipulation of these complex molecules.
Designing Proteins with AI
A key takeaway from Rives's discussion was the potential for AI to move beyond prediction and into generative design. By leveraging the insights gained from these models, researchers can now search these learned protein "worlds" to find sequences that satisfy specific design criteria, such as binding to a particular target molecule or exhibiting a desired structural property. He highlighted the successful design of novel protein binders, exemplified by the creation of mini-protein binders that could target specific proteins like EGFR or CTLA-4, demonstrating the tangible impact of this AI-driven approach.
