A new paper, "Lost in Transmission: When and Why LLMs Fail to Reason Globally," reveals a fundamental limitation in large language models. Despite their immense scale, transformer-based LLMs consistently falter on tasks requiring complex, global reasoning across lengthy inputs. The authors propose this stems from a severely restricted 'effective bandwidth' for transmitting information within their residual streams.
The core bottleneck lies in how transformers process information. For an LLM to generate an accurate output based on an entire input, critical data from early tokens must traverse the model's causal attention mechanism to reach the final token's residual stream. This autoregressive process, where early tokens cannot 'see' later ones, means any intermediate processing must be universally useful, creating a bottleneck for information flow.
The BAPO Model: A Bandwidth Bottleneck
To quantify this, the researchers introduced the Bounded Attention Prefix Oracle (BAPO). This mathematical framework simplifies transformer mechanics into an information flow model. BAPO divides an input into a Prefix and a Suffix, using limited 'bandwidth' parameters: a bits of compressed information from the Prefix Oracle and b tokens directly retrieved by an Attention Function.
