Composer AI Masters Long-Horizon Tasks

Composer AI uses a novel self-summarization technique to handle coding tasks beyond its context window, significantly improving performance on complex challenges.

3 min read
Illustration showing Composer AI's self-summarization process for long-horizon tasks.

Cursor has developed Composer AI, a specialized model designed to tackle complex coding tasks that require reasoning over extended periods. By integrating self-summarization into its reinforcement learning training, Composer can effectively manage tasks far beyond its standard context window limits. This breakthrough allows the AI to learn and execute challenging coding projects involving hundreds of actions, pushing the boundaries of what AI agents can achieve. Read more about this advancement on the Cursor Blog.

The Limits of Traditional Compaction

Existing AI agent frameworks often struggle with long-running tasks due to limitations in model context windows. When an agent's interaction history exceeds this limit, these frameworks employ 'compaction' techniques to shorten the context. This typically involves either prompted summarization or sliding context windows, both of which risk losing crucial information.

Even advanced latent space compaction methods, while promising, are currently slower and can still lead to critical data loss, hindering agent efficacy over time.

Self-Summarization: A Trained Behavior

Composer AI addresses this by treating self-summarization as a core learned behavior. Trained within the Cursor agent harness, Composer learns to identify and preserve the most vital information as it progresses through a task. When approaching its context limit, Composer pauses to generate a concise summary of its current state, including plans and remaining tasks.

This self-generated summary is then integrated with the conversation history, allowing Composer to continue its work seamlessly. Crucially, this summarization process is incorporated directly into the model's training. Self-summaries that preserve critical information are rewarded, while those that lose important details are penalized, refining Composer's ability to manage long context.

Token-Efficient Compaction Achieved

Compared to baseline compaction methods that require extensive prompts and produce lengthy summaries, Composer's self-summarization is remarkably efficient. It utilizes a simple prompt, "Please summarize the conversation," and generates summaries approximately 1,000 tokens long, retaining high-value information contextually.

Testing revealed that Composer significantly reduces error rates from compaction by up to 50% compared to tuned baselines. It achieves this while using one-fifth of the tokens and leveraging KV cache reuse, demonstrating superior performance and efficiency.

Solving Complex Coding Challenges

The true promise of this approach lies in enabling AI to tackle complex, multi-step problems. A case study from Terminal-Bench 2.0, "make-doom-for-mips," exemplifies this capability.

This challenging problem, which stumped several powerful models, was successfully solved by an early Composer checkpoint. The AI engaged in 170 turns, performing extensive coding, testing, and self-summarization, condensing over 100,000 tokens of context down to the essential 1,000 tokens needed to find the solution.

Toward a Long-Horizon Future

By embedding compaction within its training loop, Composer AI develops an explicit mechanism for efficiently carrying critical information forward. This advancement is a significant step towards training AI for even more complex, multi-agent coordination tasks and long-horizon AI tasks.

Cursor is continuing to enhance the scope and intelligence of its agentic systems, with more updates on Composer expected soon. This development follows previous work, such as how Cursor Joins JetBrains IDEs, highlighting the platform's evolving capabilities.