Shodh-MoE: Unlocking Universal SciML

The quest for universal foundation models in Scientific Machine Learning (SciML) faces a critical bottleneck: negative transfer. This phenomenon, where training across diverse physical regimes like fluid dynamics and porous media flows induces gradient conflicts and optimization instability, has hampered the plasticity of dense neural operators. The incompatible spectral and geometric demands of these distinct physics create significant challenges for single, dense parameter paths.

Visual TL;DR. SciML Bottleneck leads to Dense Operators Fail. SciML Bottleneck leads to Shodh-MoE Architecture. Shodh-MoE Architecture leads to Compressed Latents. Shodh-MoE Architecture leads to Intra-tokenizer Velocity. Intra-tokenizer Velocity leads to Physically Valid. Shodh-MoE Architecture leads to Break Interference. Break Interference leads to Universal SciML.

Related startups

SciML Bottleneck: negative transfer from training across diverse physical regimes
Dense Operators Fail: incompatible spectral/geometric demands cause gradient conflicts
Shodh-MoE Architecture: novel sparse-activated latent transformer for multi-physics transport
Compressed Latents: 16^3 physical latents generated by physics-informed autoencoder
Intra-tokenizer Velocity: Helmholtz-style parameterization constrains decoded states
Physically Valid: guaranteed divergence-free velocity manifolds for decoded states
Break Interference: resolves multi-physics interference with sparse activation
Universal SciML: enables foundation models with guaranteed physical properties

Visual TL;DRQuickExplainDeeper

Breaking Multi-Physics Interference with Sparse Activation

Ellwil and Arastu Sharma introduce the Shodh-MoE architecture, a novel sparse-activated latent transformer designed to tackle multi-physics transport. This approach leverages compressed 16^3 physical latents generated by a physics-informed autoencoder. A key innovation is the intra-tokenizer Helmholtz-style velocity parameterization, which constrains decoded states to physically valid divergence-free velocity manifolds. This not only guarantees exact mass conservation but also achieves a physically verifiable velocity divergence of approximately 2.8 x 10^-10, validated post-hoc in FP64 on 128^3 grids.

Autonomous Domain Bifurcation via Expert Routing

The core of Shodh-MoE's efficacy lies in its Top-1 soft-semantic router. This component dynamically assigns localized latent patches to specialized expert subnetworks. This dynamic routing allows for distinct parameter paths tailored to the unique physical mechanisms of different domains, while concurrently preserving shared experts for universal physical symmetries. During a large-scale distributed pretraining run, telemetry revealed an autonomous bifurcation: tokens from the open-channel fluid dynamics domain exclusively routed to Expert 0, while porous media flow tokens routed exclusively to Expert 1. This architectural mechanism enabled simultaneous convergence across both regimes, achieving low latent validation MSEs (2.46 x 10^-5 and 9.76 x 10^-6) and decoded physical MSEs (2.48 x 10^-6 and 1.76 x 10^-6).

Shodh-MoE: Unlocking Universal SciML

Related startups

Breaking Multi-Physics Interference with Sparse Activation

Autonomous Domain Bifurcation via Expert Routing

AI Daily Digest