The constraints inherent to Large Language Model (LLM) context windows—the finite memory dictating how much input an AI can process at once—have long been considered a fundamental bottleneck to truly long-horizon tasks. That bottleneck has just been decisively shattered. Researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) have introduced Recursive Language Models (RLMs), a general inference strategy that scales effective input length to over 10 million tokens, offering performance gains of up to two orders of magnitude beyond current frontier models.
The paper, authored by Alex L. Zhang, Tim Kraska, and Omar Khattab, addresses the critical issue of "context rot," the phenomenon where LLMs degrade quickly as context gets longer. This degradation is particularly evident in complex tasks requiring deep reasoning or comparison across disparate parts of a massive input, such as analyzing large codebases or deep research documents. Traditional attempts to manage long context often rely on "context condensation or compaction," repeatedly summarizing the input once it exceeds a certain length threshold. This approach is inherently lossy, sacrificing critical detail in favor of brevity and leading to catastrophic failure on multi-hop reasoning tasks, as shown by the sharp drop-off in performance exhibited by base models like GPT-5 as token length increases.
The core insight driving RLMs is a paradigm shift in how the input prompt is processed. Instead of feeding the entire, massive input directly into the neural network—a resource-intensive and often fruitless endeavor—the input is "instead treated as part of the environment that the LLM can symbolically interact with." The RLM loads the long prompt as a variable inside a Python Read-Eval-Print Loop (REPL) environment. The LLM, therefore, doesn't need to remember the entire text at once; it uses its intelligence to write and execute code, recursively querying the external variable for relevant snippets. This capability allows the model to "peek into, decompose, and invoke itself recursively over programmatic snippets of the variable." This strategy fundamentally bypasses the physical limitations of the transformer architecture, turning the core model into an intelligent search and reasoning engine capable of deep, iterative analysis across arbitrarily long inputs.
