Large language model training is getting a speed boost. Together AI has unveiled distribution-aware speculative decoding (DAS), a new framework designed to drastically cut down the time spent on Reinforcement Learning (RL) post-training.
RL fine-tuning has become critical for enhancing LLM reasoning, but the rollout phase—where models generate responses for training—presents a significant bottleneck. This process can consume up to 70% of total training time, primarily due to the long-tail nature of response generation, where a few slow generations can delay the entire batch and leave expensive GPUs idle.
