Chain-of-Thought (CoT) prompting demonstrably boosts LLM performance on complex reasoning tasks, but at a steep cost in token usage and inference expenditure. Existing methods like 'Budget Forcing' have attempted to curb these costs through heuristic length penalties, inadvertently stifling both crucial reasoning steps and redundant verbiage. This work from Fabio Valerio Massoli, Andrey Kuzmin, and Arash Behboodi, published on arXiv, offers a fundamentally new perspective by recasting efficient reasoning as a lossy compression problem governed by the Information Bottleneck (IB) principle.
Bridging the Markov Gap with CIB
A core theoretical challenge identified is how naive IB application to transformers falters due to attention mechanisms violating the Markov property between prompt, reasoning trace, and response. To surmount this, the researchers propose modeling CoT generation through the Conditional Information Bottleneck (CIB) principle. This framework treats the reasoning trace (Z) as a computational bridge, retaining only the information about the response (Y) that is not directly inferable from the prompt (X). This forms the basis of a general Reinforcement Learning objective: maximizing task reward while compressing intermediate completions under a prior distribution over reasoning traces. This approach subsumes simpler heuristics, like length penalties, as mere special cases corresponding to uniform priors.