The rapid advancement of Large Language Models (LLMs) in reasoning has largely been predicated on vast quantities of internet-derived question-answer data. However, this approach faces a significant bottleneck: the finite scale and domain-specific concentration of such datasets, particularly in scientific fields like physics. This research proposes a paradigm shift, demonstrating that physics simulators can serve as a powerful and scalable source of supervision for training LLMs in physical reasoning. The study, detailed on arXiv, involved generating synthetic question-answer pairs from random scenes within physics engines and employing reinforcement learning for LLM training.
Bridging the Synthetic-to-Real Divide
A key finding is the remarkable zero-shot sim-to-real transfer capability of models trained on this synthetic data. These LLMs, without any exposure to real-world physics problems during training, showed significant performance improvements on established benchmarks. For instance, training solely on simulated data boosted performance on International Physics Olympiad (IPhO) problems by 5-10 percentage points across various model sizes. This validates the efficacy of using physics simulators as a robust data generation engine for advancing LLM physics reasoning.