Thinking Machines Lab

Scott Jeen, Matthew Aitchison & Mantic

[Blog Post]


Summary

The top AI forecasting systems are approaching superforecaster-level accuracy on geopolitics and current affairs, but to date have relied on off-the-shelf LLMs not explicitly trained for forecasting. This post shows that RL fine-tuning gpt-oss-120b on ~10,000 binary event questions lifts its forecasting score from below all frontier models to marginally above them. The fine-tuned model also learns predictions that are decorrelated from frontier LLMs, making it the second-most-important contributor to the optimal forecast ensemble after Grok 4. Together, the results demonstrate that on-task training can extend the state-of-the-art in AI forecasting.


Check out the full blog post for more details. If you find this work informative, please consider citing:

@article{scott2026forecasting,
  author = {Scott Jeen, Matthew Aitchison, and Mantic},
  title = {Training LLMs to Predict World Events},
  journal = {Thinking Machines Lab: News},
  year = {2026},
  note = {https://thinkingmachines.ai/news/training-llms-to-predict-world-events/}
}