The quest for universal foundation models in Scientific Machine Learning (SciML) faces a critical bottleneck: negative transfer. This phenomenon, where training across diverse physical regimes like fluid dynamics and porous media flows induces gradient conflicts and optimization instability, has hampered the plasticity of dense neural operators. The incompatible spectral and geometric demands of these distinct physics create significant challenges for single, dense parameter paths.
Related startups
Breaking Multi-Physics Interference with Sparse Activation
Ellwil and Arastu Sharma introduce the Shodh-MoE architecture, a novel sparse-activated latent transformer designed to tackle multi-physics transport. This approach leverages compressed 16^3 physical latents generated by a physics-informed autoencoder. A key innovation is the intra-tokenizer Helmholtz-style velocity parameterization, which constrains decoded states to physically valid divergence-free velocity manifolds. This not only guarantees exact mass conservation but also achieves a physically verifiable velocity divergence of approximately 2.8 x 10^-10, validated post-hoc in FP64 on 128^3 grids.
Autonomous Domain Bifurcation via Expert Routing
The core of Shodh-MoE's efficacy lies in its Top-1 soft-semantic router. This component dynamically assigns localized latent patches to specialized expert subnetworks. This dynamic routing allows for distinct parameter paths tailored to the unique physical mechanisms of different domains, while concurrently preserving shared experts for universal physical symmetries. During a large-scale distributed pretraining run, telemetry revealed an autonomous bifurcation: tokens from the open-channel fluid dynamics domain exclusively routed to Expert 0, while porous media flow tokens routed exclusively to Expert 1. This architectural mechanism enabled simultaneous convergence across both regimes, achieving low latent validation MSEs (2.46 x 10^-5 and 9.76 x 10^-6) and decoded physical MSEs (2.48 x 10^-6 and 1.76 x 10^-6).