Biological systems operate under intricate, coupled constraints spanning sequence, structure, regulation, evolution, and cellular context. Existing foundation models in biology, however, often operate in silos, focusing on single modalities or fixed forward tasks. This fragmentation limits their ability to capture the holistic nature of biological function.
Bridging Modalities with MIMIC
The researchers introduce MIMIC, a generative multimodal foundation model designed to overcome these limitations. Trained on the newly curated and aligned LORE dataset, MIMIC integrates nucleic acid, protein, evolutionary, structural, regulatory, and semantic/contextual data. Its split-track encoder-decoder architecture is a crucial innovation, enabling it to condition on arbitrary subsets of observed modalities and reconstruct or generate missing components of molecular states across the genome, transcriptome, and proteome.