The constraints inherent to Large Language Model (LLM) context windows—the finite memory dictating how much input an AI can process at once—have long been considered a fundamental bottleneck to truly long-horizon tasks. That bottleneck has just been decisively shattered. Researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) have introduced Recursive Language Models (RLMs), a general inference strategy that scales effective input length to over 10 million tokens, offering performance gains of up to two orders of magnitude beyond current frontier models.
The paper, authored by Alex L. Zhang, Tim Kraska, and Omar Khattab, addresses the critical issue of "context rot," the phenomenon where LLMs degrade quickly as context gets longer. This degradation is particularly evident in complex tasks requiring deep reasoning or comparison across disparate parts of a massive input, such as analyzing large codebases or deep research documents. Traditional attempts to manage long context often rely on "context condensation or compaction," repeatedly summarizing the input once it exceeds a certain length threshold. This approach is inherently lossy, sacrificing critical detail in favor of brevity and leading to catastrophic failure on multi-hop reasoning tasks, as shown by the sharp drop-off in performance exhibited by base models like GPT-5 as token length increases.
