ReClaim: Unlocking Healthcare Insights from Claims Data

The vast, underutilized potential of administrative claims data for healthcare AI is now being unlocked. While rich in longitudinal detail, this data has been largely unexplored as a foundation for advanced modeling. A new generative transformer, ReClaim, trained from scratch on 43.8 billion medical events, demonstrates the power of this data source.

Administrative Claims as a Scalable Healthcare Foundation Model Substrate

ReClaim, a generative transformer trained on over 200 million enrollees' data from 2008-2022, models longitudinal trajectories across diagnoses, procedures, medications, and expenditures. Scaled up to 1.7 billion parameters, this approach proves that administrative claims are not just records but a potent substrate for building powerful healthcare foundation models. The model's ability to capture financial outcomes and improve real-world evidence (RWE) analyses underscores this potential.

Unprecedented Performance in Disease Prediction and RWE

Across over 1,000 disease-onset prediction tasks, the ReClaim foundation model achieved a mean AUC of 75.6%, substantially outperforming disease-specific LightGBM (66.3%) and the transformer-based Delphi model (69.4%). Notably, ReClaim showed the largest gains for rare diseases, a critical area for clinical advancement. These advantages were consistent across retrospective and prospective evaluations, and in external validation on independent datasets. Furthermore, for healthcare expenditure forecasting, ReClaim increased explained variance from 0.28 to 0.37 compared to LightGBM, and in target trial emulation, it reduced systematic bias by 72% on average relative to Delphi, showcasing its real-world utility.

Monotonic Scaling and Post-Training Gains Drive Efficacy

Performance improvements in the ReClaim foundation model scaled monotonically with model size, highlighting the benefits of continued scaling. Crucially, post-training added 13.8 percentage points to performance over pre-training alone, indicating the significant value of fine-tuning on specific downstream tasks. This synergy between scale and targeted training enables learned representations that generalize effectively across time periods and diverse data sources, supporting critical applications like disease surveillance, expenditure forecasting, and robust RWE generation.

ReClaim: Unlocking Healthcare Insights from Claims Data

Administrative Claims as a Scalable Healthcare Foundation Model Substrate

Related startups

Unprecedented Performance in Disease Prediction and RWE

Monotonic Scaling and Post-Training Gains Drive Efficacy

AI Daily Digest