#Speculative Decoding
5 articles with this tag
Bridging Diffusion LLMs and Speculative Decoding
A novel SimSD speculative decoding method enables diffusion LLMs to achieve up to 7.46x higher throughput without sacrificing generation quality.

Together AI Supercharges LLM Inference
Together AI unveils ATLAS, accelerating LLM inference up to 4x with adaptive speculative decoding, tackling the growing cost challenge for AI-native companies.

Together AI Slashes RL Training Time
Together AI's new distribution-aware speculative decoding slashes RL training time by up to 50%, tackling a major bottleneck in LLM post-training.

Cloudflare's LLM Infrastructure Deep Dive
Cloudflare details its advanced infrastructure optimizations for running large language models on its Workers AI platform, focusing on performance and cost-efficiency.

Together AI's Aurora Learns on the Fly
Together AI's Aurora framework uses RL to continuously adapt speculative decoding for faster LLM inference, outperforming static models.