"The science of deep learning," according to Jack Merullo, Research Scientist at Goodfire, is centered on making models not just powerful, but understandable, robust, and safe enough to deploy in high-stakes industries. This fundamental shift, from treating large neural networks as impenetrable black boxes to viewing them as complex systems ripe for reverse engineering, formed the core of the discussion between Merullo, Mark Bissell (Applied Research at Goodfire), and Swyx (Editor of Latent Space) at NeurIPS. The conversation provided a sharp analysis of the state of mechanistic interpretability (MechInterp) heading into 2026, focusing heavily on how foundational research is now translating into immediate, practical applications across diverse domains, from creative tooling to life sciences and finance.
The discussion, hosted live at NeurIPS, centered on Goodfire’s mission: building an interpretability platform capable of cracking open these black boxes across modalities. Merullo, who transitioned from a PhD focused on language model grounding, articulated the foundational research path, while Bissell, coming from a background in healthcare engineering at Palantir, grounded the talk in applied use cases. The immediate utility of MechInterp is perhaps best illustrated by the company’s viral research preview, `paint.goodfire.ai`, which allows users to interact directly with the latent space of diffusion models.
