The decade-long anomaly of shallow reinforcement learning networks has finally been broken. The conventional wisdom—that deep networks inherently fail in RL—was challenged and overturned by a team from Princeton that successfully scaled models to 1,000 layers, a feat previously thought impossible. This breakthrough, which earned the team the Best Paper award at NeurIPS 2025, represents a fundamental paradigm shift that promises to accelerate the capabilities of autonomous systems.
Kevin Wang, Ishaan Javali, Michał Bortkiewicz, and their advisor Benjamin Eysenbach spoke with Swyx of Latent Space at NeurIPS 2025 about their award-winning work, "1000 Layer Networks for Self-Supervised RL," detailing how they achieved massive scaling by fundamentally rethinking the core objective function. For years, deep RL models stagnated at two or three layers, an anomaly compared to the revolution seen in vision and language models. Eysenbach noted that he was "kind of skeptical it was going to work" when the students proposed pushing depth.
