The promise of foundation models has largely been confined to NLP and vision, leaving the critical domain of time series forecasting in a fragmented state. This work demonstrates that time series models, much like their counterparts in other domains, exhibit remarkable scalability, with a single training recipe yielding consistent forecast quality gains from millions to billions of parameters. The researchers behind Toto 2.0 have codified this insight into a practical framework.
Related startups
Unified Scaling Recipe for Forecast Accuracy
The core innovation lies in a robust training methodology that proves effective across a wide spectrum of model sizes, from 4 million to 2.5 billion parameters. This scaling law suggests a path towards highly performant and reliable time series forecasting without the need for bespoke tuning for each parameter class. The five Toto 2.0 forecasting models released under Apache 2.0 are a testament to this unified approach, setting new benchmarks in forecast quality.
State-of-the-Art Performance Across Benchmarks
The Toto 2.0 forecasting models have established new state-of-the-art results on three distinct forecasting benchmarks: BOOM (observability), GIFT-Eval (general-purpose), and the contamination-resistant TIME benchmark. This broad success underscores the generalizability of the architecture and training recipe, addressing a key challenge in the time series domain. The report details not only the experimental outcomes but also the architectural design, training data strategy, and the innovative u-muP hyperparameter transfer pipeline that underpins these achievements.