MoE LLMs Confront Real-World Hardware Noise

Mixture-of-Experts (MoE) Large Language Models (LLMs) offer remarkable scalability through sparse expert activation, a paradigm promising for memory bandwidth-intensive Compute-In-Memory (CIM) architectures. However, the practical deployment of these models on analog CIM systems is hampered by inherent hardware imperfections. This work presents the first systematic investigation into the impact of MoE LLM hardware noise, revealing that these imperfections critically disrupt expert load balance and render standard routing decisions suboptimal.

Visual TL;DR. MoE LLMs on CIM leads to Hardware Noise. Hardware Noise leads to Routing Suboptimality. Routing Suboptimality causes Performance Degradation. ROMER Framework enables Restores Load Balance. ROMER Framework enables Stabilizes Routing. Restores Load Balance shows Broad Generalizability. Stabilizes Routing shows Broad Generalizability.

Related startups

MoE LLMs on CIM: sparse expert activation promising for memory bandwidth-intensive compute-in-memory architectures
Hardware Noise: analog CIM hardware imperfections perturb stored weights, disrupting expert load balance
Routing Suboptimality: standard routing decisions become consistently suboptimal in the presence of hardware noise
Performance Degradation: critical disruption in load balance and suboptimal routing directly impacting model performance
ROMER Framework: a novel calibration framework for noisy MoE deployments on analog CIM systems
Restores Load Balance: significantly improves accuracy by restoring expert load balance and stabilizing routing
Stabilizes Routing: significantly improves accuracy by restoring expert load balance and stabilizing routing
Broad Generalizability: demonstrates broad generalizability across various MoE models and hardware noise levels

Visual TL;DRQuickExplainDeeper

Noise-Induced Routing Suboptimality in MoE Architectures

The researchers observed that analog CIM hardware imperfections, calibrated with real chip measurements, significantly perturb stored weights. This perturbation leads to a critical disruption in the load balance of MoE experts. Consequently, the routing decisions made by clean-trained models become consistently suboptimal in the presence of this hardware noise, directly impacting model performance and efficiency.

ROMER: A Novel Calibration Framework for Noisy MoE Deployments

To address these challenges, the paper introduces ROMER, a post-training calibration framework. ROMER employs two key strategies: it replaces underactivated experts with high-frequency ones to restore expert load balance and recalibrates router logits via percentile-based normalization to stabilize routing under noisy conditions. This approach demonstrably mitigates the performance degradation caused by MoE LLM hardware noise.

Broad Generalizability Across MoE Models

Extensive experiments across multiple benchmarks showcase ROMER's effectiveness and generalizability. The framework achieves substantial perplexity reductions—up to 58.6%, 58.8%, and 59.8% for DeepSeek-MoE, Qwen-MoE, and OLMoE, respectively—under real-chip noise conditions. This highlights the critical need for robust calibration techniques when deploying advanced MoE LLM hardware noise-aware solutions on specialized hardware.

MoE LLMs Confront Real-World Hardware Noise

Related startups

Noise-Induced Routing Suboptimality in MoE Architectures

ROMER: A Novel Calibration Framework for Noisy MoE Deployments

Broad Generalizability Across MoE Models

AI Daily Digest