Mixture-of-Experts (MoE) Large Language Models (LLMs) offer remarkable scalability through sparse expert activation, a paradigm promising for memory bandwidth-intensive Compute-In-Memory (CIM) architectures. However, the practical deployment of these models on analog CIM systems is hampered by inherent hardware imperfections. This work presents the first systematic investigation into the impact of MoE LLM hardware noise, revealing that these imperfections critically disrupt expert load balance and render standard routing decisions suboptimal.
Related startups
Noise-Induced Routing Suboptimality in MoE Architectures
The researchers observed that analog CIM hardware imperfections, calibrated with real chip measurements, significantly perturb stored weights. This perturbation leads to a critical disruption in the load balance of MoE experts. Consequently, the routing decisions made by clean-trained models become consistently suboptimal in the presence of this hardware noise, directly impacting model performance and efficiency.
ROMER: A Novel Calibration Framework for Noisy MoE Deployments
To address these challenges, the paper introduces ROMER, a post-training calibration framework. ROMER employs two key strategies: it replaces underactivated experts with high-frequency ones to restore expert load balance and recalibrates router logits via percentile-based normalization to stabilize routing under noisy conditions. This approach demonstrably mitigates the performance degradation caused by MoE LLM hardware noise.