Beyond Token Count: Semantic Compression for LLMs

Researchers recast LLM reasoning as lossy compression using the Conditional Information Bottleneck (CIB), employing semantic surprisal for efficient token pruning.

Mar 10 at 8:00 PM2 min read
Abstract diagram illustrating the Information Bottleneck principle applied to LLM reasoning.

Chain-of-Thought (CoT) prompting demonstrably boosts LLM performance on complex reasoning tasks, but at a steep cost in token usage and inference expenditure. Existing methods like 'Budget Forcing' have attempted to curb these costs through heuristic length penalties, inadvertently stifling both crucial reasoning steps and redundant verbiage. This work from Fabio Valerio Massoli, Andrey Kuzmin, and Arash Behboodi, published on arXiv, offers a fundamentally new perspective by recasting efficient reasoning as a lossy compression problem governed by the Information Bottleneck (IB) principle.

Bridging the Markov Gap with CIB

A core theoretical challenge identified is how naive IB application to transformers falters due to attention mechanisms violating the Markov property between prompt, reasoning trace, and response. To surmount this, the researchers propose modeling CoT generation through the Conditional Information Bottleneck (CIB) principle. This framework treats the reasoning trace (Z) as a computational bridge, retaining only the information about the response (Y) that is not directly inferable from the prompt (X). This forms the basis of a general Reinforcement Learning objective: maximizing task reward while compressing intermediate completions under a prior distribution over reasoning traces. This approach subsumes simpler heuristics, like length penalties, as mere special cases corresponding to uniform priors.

Semantic Surprisal: A Smarter Cost Metric

Moving beyond simple token-counting metrics, the proposed CIB objective introduces a more sophisticated semantic prior. This prior quantifies token cost based on their surprisal under a language model's own predictions. Empirically, this semantic prior allows the Conditional Information Bottleneck (CIB) LLM framework to effectively prune 'cognitive bloat' – extraneous tokens – while preserving the fluency and logical coherence of the reasoning process. The results indicate improved accuracy at moderate compression levels and substantially reduced accuracy degradation even under aggressive compression, marking a significant advancement in efficient LLM reasoning.