The critical need for understanding uncertainty in foundation models clashes with the computational impracticality of traditional Bayesian methods at scale. State-of-the-art models, often leveraging Mixture-of-Experts (MoE) architectures, present a unique challenge. This work introduces Variational Mixture-of-Experts Routing (VMoER), a structured Bayesian approach designed to address this gap by confining inference to the expert selection stage.
Calibrating Uncertainty in Sparse Architectures
VMoER strategically injects Bayesian principles into the deterministic routing networks of MoE layers, a critical component for scaling foundation models. By focusing inference on this selection process, the approach avoids the prohibitive computational costs associated with full Bayesian inference on massive parameter counts. The researchers demonstrate VMoER's efficacy through two inference strategies: amortised variational inference over routing logits and inferring a temperature parameter for stochastic expert selection. This targeted application allows for principled uncertainty quantification without sacrificing the efficiency gains offered by sparsity.