Foundation Models Unlock Time Series Scaling

Toto 2.0 foundation models demonstrate remarkable scaling, achieving state-of-the-art forecasting performance across multiple benchmarks with a unified training approach.

May 20 at 8:01 PM6 min read

Abstract representation of data streams and AI model connections — Illustrating the scalability and performance of Toto 2.0 forecasting models.

Visual TL;DR. Time Series Fragmentation addressed by Toto 2.0 Foundation Models. Toto 2.0 Foundation Models uses Unified Scaling Recipe. Unified Scaling Recipe enables Consistent Quality Gains. Consistent Quality Gains leads to State-of-the-Art Performance. Unified Scaling Recipe codified into Practical Framework. Toto 2.0 Foundation Models released as Apache 2.0 Release.

Time Series Fragmentation: time series forecasting domain remains fragmented, unlike NLP and vision
Toto 2.0 Foundation Models: new foundation models demonstrate remarkable scalability for time series
Unified Scaling Recipe: single training approach effective across millions to billions of parameters
Consistent Quality Gains: forecast quality improves reliably with increased model parameter size
State-of-the-Art Performance: achieving new benchmarks across multiple forecasting benchmarks
Practical Framework: codified insights into a usable and accessible framework for researchers
Apache 2.0 Release: five Toto 2.0 models released under open-source license

Visual TL;DRQuickExplainDeeper

The promise of foundation models has largely been confined to NLP and vision, leaving the critical domain of time series forecasting in a fragmented state. This work demonstrates that time series models, much like their counterparts in other domains, exhibit remarkable scalability, with a single training recipe yielding consistent forecast quality gains from millions to billions of parameters. The researchers behind Toto 2.0 have codified this insight into a practical framework.

Unified Scaling Recipe for Forecast Accuracy

The core innovation lies in a robust training methodology that proves effective across a wide spectrum of model sizes, from 4 million to 2.5 billion parameters. This scaling law suggests a path towards highly performant and reliable time series forecasting without the need for bespoke tuning for each parameter class. The five Toto 2.0 forecasting models released under Apache 2.0 are a testament to this unified approach, setting new benchmarks in forecast quality.

State-of-the-Art Performance Across Benchmarks

The Toto 2.0 forecasting models have established new state-of-the-art results on three distinct forecasting benchmarks: BOOM (observability), GIFT-Eval (general-purpose), and the contamination-resistant TIME benchmark. This broad success underscores the generalizability of the architecture and training recipe, addressing a key challenge in the time series domain. The report details not only the experimental outcomes but also the architectural design, training data strategy, and the innovative u-muP hyperparameter transfer pipeline that underpins these achievements.

© 2026 StartupHub.ai. All rights reserved. Do not enter, scrape, copy, reproduce, or republish this article in whole or in part. Use as input to AI training, fine-tuning, retrieval-augmented generation, or any machine-learning system is prohibited without written license. Substantially-similar derivative works will be pursued to the fullest extent of applicable copyright, database, and computer-misuse laws. See our terms.

#AI Research #Time Series #Foundation Models #Forecasting #Machine Learning