For decades, geneticists have focused on the two percent of the human genome that codes for proteins, often dismissing the remaining 98 percent as "junk DNA." That era is officially ending.
ARC Innovation at Sheba Medical Center and the Icahn School of Medicine at Mount Sinai have announced a landmark three-year collaboration with NVIDIA to build what they are calling a Genomic Large Language Model (Genomic LLM), or Genomic Foundation Model (gFM). The goal is ambitious: to use advanced AI to finally decipher the vast, poorly understood regulatory sequences that govern human health and disease.
This isn't just another research grant; it’s a massive computational undertaking that treats DNA sequences like text. Just as GPT models learn grammar and context from billions of words, this Genomic LLM will learn the biological "language" of the genome from extensive clinical and genomic datasets provided by the medical institutions.
