Together AI Slashes RL Training Time

Large language model training is getting a speed boost. Together AI has unveiled distribution-aware speculative decoding (DAS), a new framework designed to drastically cut down the time spent on Reinforcement Learning (RL) post-training.

RL fine-tuning has become critical for enhancing LLM reasoning, but the rollout phase—where models generate responses for training—presents a significant bottleneck. This process can consume up to 70% of total training time, primarily due to the long-tail nature of response generation, where a few slow generations can delay the entire batch and leave expensive GPUs idle.

Tackling the Rollout Bottleneck

DAS directly addresses this by optimizing the rollout process. It achieves up to a 50% speedup in RL rollouts, a crucial improvement for large-scale AI training.

The framework cleverly exploits two key properties of RL rollouts: the reuse of prompts across training epochs and the long-tail distribution of generation times. Unlike standard inference, RL training revisits the same prompts repeatedly, providing a rich history that DAS can leverage.

DAS employs an adaptive suffix tree drafter that learns from recent rollouts, allowing it to stay synchronized with the evolving model weights without requiring constant retraining. This training-free approach continuously adapts to the changing policy.

Intelligent Scheduling for Efficiency

Complementing the drafter is a length-aware scheduling strategy. This system balances the workload across GPUs, preventing long generations from monopolizing resources. It also dynamically allocates speculation budgets within GPUs, giving longer requests more attention early on to avoid costly late-stage computations.

Experimental results on math reasoning and code generation tasks demonstrate that DAS achieves its speedup without any degradation in reward quality, preserving the training signal entirely.

This technique offers a compelling solution for cutting compute costs in RL post-training. As models grow larger and tasks more complex, the rollout bottleneck will only intensify, making DAS a valuable tool for practitioners seeking efficiency without sacrificing performance.

Together AI Slashes RL Training Time

Related startups

Tackling the Rollout Bottleneck

Intelligent Scheduling for Efficiency

AI Daily Digest