The inherent seriality and discrete nature of textual chain-of-thought (CoT) in large language models impose significant limitations on computational bandwidth for reasoning. Verbalizing each intermediate step before proceeding, even for semantic or partial computations, creates a bottleneck.
Bridging Continuous States and Autoregressive Generation
To address this, the researchers propose NF-CoT, a novel latent reasoning framework. It leverages normalizing flows to model continuous thoughts, offering a higher-bandwidth alternative to explicit textual CoT. Crucially, NF-CoT preserves key advantages of traditional autoregressive language models, including native left-to-right generation, probabilistic sampling, compatibility with KV-cache decoding, and tractable likelihood estimation. This is achieved by integrating a TARFlow-style normalizing flow directly within the LLM backbone, enabling the generation of continuous thought positions via an NF head alongside standard text generation from the LM head.