MLIPs: From Blind Screening to Certified Discovery

New framework Proof-Carrying Materials (PCM) ensures reliability for machine-learned interatomic potentials (MLIPs), dramatically improving materials discovery.

Mar 13 at 8:01 PM2 min read
Diagram illustrating the three stages of Proof-Carrying Materials (PCM) framework: adversarial falsification, bootstrap envelope refinement, and Lean 4 formal certification.

The widespread adoption of machine-learned interatomic potentials (MLIPs) for high-throughput materials screening is currently hampered by a critical lack of formal reliability guarantees. This oversight leads to significant inefficiencies, as demonstrated by the finding that a single MLIP can miss a staggering 93% of density functional theory (DFT)-stable materials, achieving a recall of only 0.07 on a large benchmark.

Bridging the Reliability Gap with Proof-Carrying Materials

The researchers introduce Proof-Carrying Materials (PCM) as a novel framework to address this fundamental limitation. PCM operates in three stages: adversarial falsification across compositional space to probe MLIP weaknesses, bootstrap envelope refinement utilizing 95% confidence intervals to quantify uncertainty, and Lean 4 formal certification to provide rigorous assurance. This systematic approach directly tackles the inherent unreliability of current MLIP deployments.

Uncovering Architecture-Specific Blind Spots

Auditing prominent MLIP models like CHGNet, TensorNet, and MACE revealed significant architecture-specific blind spots. These models exhibit near-zero pairwise error correlations (r <= 0.13), a finding further validated by independent Quantum ESPRESSO simulations. The median DFT/CHGNet force ratio was a stark 12x, underscoring the critical need for auditing before deployment in sensitive materials discovery pipelines.

Predictive Risk Modeling for Robust MLIP Deployment

A key innovation of the PCM framework is a risk model trained on discovered features. This model accurately predicts MLIP failures on unseen materials with an AUC-ROC of 0.938 +/- 0.004. Crucially, this predictive capability demonstrates strong transferability across different MLIP architectures, achieving cross-MLIP AUC-ROC scores around 0.70, with feature importance correlations of r = 0.877. This suggests a pathway toward more generalized reliability assessment for various machine-learned interatomic potentials.

In a practical thermoelectric screening case study, protocols audited by PCM led to the discovery of 62 additional stable materials that were missed by a single-MLIP screening approach, representing a 25% improvement in discovery yield. This highlights the tangible impact of moving beyond uncertified MLIPs.