Bridging Diffusion LLMs and Speculative Decoding

A novel SimSD speculative decoding method enables diffusion LLMs to achieve up to 7.46x higher throughput without sacrificing generation quality.

5 min read
Diagram illustrating the SimSD speculative decoding process for diffusion language models.
The SimSD framework enables speculative decoding for diffusion LLMs.

Diffusion large language models (dLLMs) offer a compelling alternative to autoregressive (AR) models with potential for faster inference. However, their masked language modeling paradigm has historically precluded them from benefiting from speculative decoding, a critical acceleration technique for AR models. This paper introduces a solution to this disconnect.

Visual TL;DR. dLLMs vs AR Models problem Speculative Decoding Barrier. Speculative Decoding Barrier solution SimSD Method. SimSD Method enables Temporally Valid Contexts. Temporally Valid Contexts leads to Throughput Gains. Throughput Gains and Quality Preservation.

Related startups

  1. dLLMs vs AR Models: diffusion LLMs offer faster inference potential than autoregressive models
  2. Speculative Decoding Barrier: dLLMs' masked modeling prevents standard token-level speculative verification
  3. SimSD Method: plug-and-play masking strategy with reference tokens and attention mask
  4. Temporally Valid Contexts: enables dLLMs to compute valid contexts for token verification
  5. Throughput Gains: achieve up to 7.46x higher throughput
  6. Quality Preservation: without sacrificing generation quality
Visual TL;DR
Visual TL;DR — startuphub.ai dLLMs vs AR Models problem Speculative Decoding Barrier. Speculative Decoding Barrier solution SimSD Method. Throughput Gains and Quality Preservation problem solution and dLLMs vs AR Models Speculative Decoding Barrier SimSD Method Throughput Gains Quality Preservation From startuphub.ai · The publishers behind this format
Visual TL;DR — startuphub.ai dLLMs vs AR Models problem Speculative Decoding Barrier. Speculative Decoding Barrier solution SimSD Method. Throughput Gains and Quality Preservation problem solution and dLLMs vs ARModels SpeculativeDecoding Barrier SimSD Method Throughput Gains QualityPreservation From startuphub.ai · The publishers behind this format
Visual TL;DR — startuphub.ai dLLMs vs AR Models problem Speculative Decoding Barrier. Speculative Decoding Barrier solution SimSD Method. Throughput Gains and Quality Preservation problem solution and dLLMs vs AR Models diffusion LLMs offer faster inferencepotential than autoregressive models Speculative Decoding Barrier dLLMs' masked modeling prevents standardtoken-level speculative verification SimSD Method plug-and-play masking strategy withreference tokens and attention mask Throughput Gains achieve up to 7.46x higher throughput Quality Preservation without sacrificing generation quality From startuphub.ai · The publishers behind this format
Visual TL;DR — startuphub.ai dLLMs vs AR Models problem Speculative Decoding Barrier. Speculative Decoding Barrier solution SimSD Method. Throughput Gains and Quality Preservation problem solution and dLLMs vs ARModels diffusion LLMsoffer fasterinference potential… SpeculativeDecoding Barrier dLLMs' maskedmodeling preventsstandard… SimSD Method plug-and-playmasking strategywith reference… Throughput Gains achieve up to 7.46xhigher throughput QualityPreservation without sacrificinggeneration quality From startuphub.ai · The publishers behind this format
Visual TL;DR — startuphub.ai dLLMs vs AR Models problem Speculative Decoding Barrier. Speculative Decoding Barrier solution SimSD Method. SimSD Method enables Temporally Valid Contexts. Temporally Valid Contexts leads to Throughput Gains. Throughput Gains and Quality Preservation problem solution enables and dLLMs vs AR Models diffusion LLMs offer faster inferencepotential than autoregressive models Speculative Decoding Barrier dLLMs' masked modeling prevents standardtoken-level speculative verification SimSD Method plug-and-play masking strategy withreference tokens and attention mask Temporally Valid Contexts enables dLLMs to compute valid contextsfor token verification Throughput Gains achieve up to 7.46x higher throughput Quality Preservation without sacrificing generation quality From startuphub.ai · The publishers behind this format
Visual TL;DR — startuphub.ai dLLMs vs AR Models problem Speculative Decoding Barrier. Speculative Decoding Barrier solution SimSD Method. SimSD Method enables Temporally Valid Contexts. Temporally Valid Contexts leads to Throughput Gains. Throughput Gains and Quality Preservation problem solution enables and dLLMs vs ARModels diffusion LLMsoffer fasterinference potential… SpeculativeDecoding Barrier dLLMs' maskedmodeling preventsstandard… SimSD Method plug-and-playmasking strategywith reference… Temporally ValidContexts enables dLLMs tocompute validcontexts for token… Throughput Gains achieve up to 7.46xhigher throughput QualityPreservation without sacrificinggeneration quality From startuphub.ai · The publishers behind this format

Unlocking Speculative Decoding for dLLMs

The core challenge lies in the dLLM's masked language modeling formulation, which relies on bidirectional attention and mask tokens. Unlike AR models where causal masking ensures temporally valid contexts for token verification, dLLMs' context shifts across denoising steps. This prevents standard token-level speculative verification. The proposed solution, SimSD, introduces a plug-and-play masking strategy. By incorporating reference tokens from a draft model and carefully designing an attention mask, SimSD equips dLLMs with temporally valid contexts. This enables them to compute valid logits for multiple drafted tokens in a single forward pass, effectively restoring the verification capability crucial for speculative decoding while retaining dLLMs' parallel decoding advantages.

Significant Throughput Gains with Quality Preservation

The SimSD speculative decoding algorithm is training-free and integrates seamlessly with other acceleration methods like KV caching and blockwise decoding. Experiments on the SDAR-family dLLMs across four benchmarks demonstrate substantial performance improvements. The researchers observed up to 7.46x higher decoding throughput. Critically, this acceleration was achieved while maintaining, and in some cases even improving, the average generation quality. This suggests that SimSD offers a robust path to significantly enhance the efficiency of dLLM inference without compromising output quality.

© 2026 StartupHub.ai. All rights reserved. Do not enter, scrape, copy, reproduce, or republish this article in whole or in part. Use as input to AI training, fine-tuning, retrieval-augmented generation, or any machine-learning system is prohibited without written license. Substantially-similar derivative works will be pursued to the fullest extent of applicable copyright, database, and computer-misuse laws. See our terms.