The widespread adoption of machine-learned interatomic potentials (MLIPs) for high-throughput materials screening is currently hampered by a critical lack of formal reliability guarantees. This oversight leads to significant inefficiencies, as demonstrated by the finding that a single MLIP can miss a staggering 93% of density functional theory (DFT)-stable materials, achieving a recall of only 0.07 on a large benchmark.
Bridging the Reliability Gap with Proof-Carrying Materials
The researchers introduce Proof-Carrying Materials (PCM) as a novel framework to address this fundamental limitation. PCM operates in three stages: adversarial falsification across compositional space to probe MLIP weaknesses, bootstrap envelope refinement utilizing 95% confidence intervals to quantify uncertainty, and Lean 4 formal certification to provide rigorous assurance. This systematic approach directly tackles the inherent unreliability of current MLIP deployments.
Uncovering Architecture-Specific Blind Spots
Auditing prominent MLIP models like CHGNet, TensorNet, and MACE revealed significant architecture-specific blind spots. These models exhibit near-zero pairwise error correlations (r <= 0.13), a finding further validated by independent Quantum ESPRESSO simulations. The median DFT/CHGNet force ratio was a stark 12x, underscoring the critical need for auditing before deployment in sensitive materials discovery pipelines.
Predictive Risk Modeling for Robust MLIP Deployment
A key innovation of the PCM framework is a risk model trained on discovered features. This model accurately predicts MLIP failures on unseen materials with an AUC-ROC of 0.938 +/- 0.004. Crucially, this predictive capability demonstrates strong transferability across different MLIP architectures, achieving cross-MLIP AUC-ROC scores around 0.70, with feature importance correlations of r = 0.877. This suggests a pathway toward more generalized reliability assessment for various machine-learned interatomic potentials.
In a practical thermoelectric screening case study, protocols audited by PCM led to the discovery of 62 additional stable materials that were missed by a single-MLIP screening approach, representing a 25% improvement in discovery yield. This highlights the tangible impact of moving beyond uncertified MLIPs.


