The prevailing paradigm for Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) relies on textual or interleaved textual-visual reasoning. This work challenges that assumption, proposing a radical shift: leveraging images as the sole medium for AI reasoning.
Optical Reasoning: Visualizing Thought Processes
The core innovation, optical reasoning, posits that images can serve as a standalone reasoning engine. This approach is instantiated in two forms: typographic-based optical reasoning, which strategically arranges visual elements for compact rationale display, and graphical-based optical reasoning, which integrates text and graphics into structured visual rationales. This novel framework aims to move beyond traditional text-centric approaches in AI.