#Speculative Decoding

6 articles with this tag

Modal CTO on the 100,000 Sandbox Problem

Modal CTO Akshat Bubna discusses the "100,000 Sandbox Problem" and Modal's approach to scalable, flexible LLM inference infrastructure.

22 days ago

AI Research

Bridging Diffusion LLMs and Speculative Decoding

A novel SimSD speculative decoding method enables diffusion LLMs to achieve up to 7.46x higher throughput without sacrificing generation quality.

about 2 months ago

Technology

Together AI Supercharges LLM Inference

Together AI unveils ATLAS, accelerating LLM inference up to 4x with adaptive speculative decoding, tackling the growing cost challenge for AI-native companies.

3 months ago

Technology

Together AI Slashes RL Training Time

Together AI's new distribution-aware speculative decoding slashes RL training time by up to 50%, tackling a major bottleneck in LLM post-training.

3 months ago

Technology

Cloudflare's LLM Infrastructure Deep Dive

Cloudflare details its advanced infrastructure optimizations for running large language models on its Workers AI platform, focusing on performance and cost-efficiency.

4 months ago

Technology

Together AI's Aurora Learns on the Fly

Together AI's Aurora framework uses RL to continuously adapt speculative decoding for faster LLM inference, outperforming static models.

4 months ago