The race to build AI systems that can predict world events is heating up. Mantic, a company using the Tinker platform, has demonstrated that fine-tuning Large Language Models (LLMs) for forecasting tasks can elevate their performance to levels comparable with top-tier, general-purpose models.
Fine-Tuning for Foresight
The prevailing strategy for AI forecasting has relied on off-the-shelf LLMs like Gemini 3 or GPT-5, augmented with specialized context-gathering techniques. These models, while powerful, were not inherently designed for prediction.
Mantic’s research focused on "judgmental forecasting" – predictions requiring human-like research and reasoning, crucial for domains like geopolitics and economics where traditional statistical methods fall short. Drawing inspiration from the book Superforecasting, they explored whether models explicitly trained for forecasting could outperform their generalist counterparts.
Using reinforcement learning on approximately 10,000 binary questions (e.g., "Will event X occur before date Y?"), Mantic fine-tuned a model called gpt-oss-120b. This process rewarded the model for assigning higher probabilities to correct real-world outcomes.
