MoE LLMs Confront Real-World Hardware Noise

Hardware noise in CIM systems degrades MoE LLM performance. ROMER, a new calibration framework, significantly improves accuracy by restoring load balance and stabilizing routing.

6 min read
Diagram illustrating the ROMER framework for MoE LLM calibration under hardware noise.
The ROMER framework addresses hardware noise challenges in MoE LLMs.

Mixture-of-Experts (MoE) Large Language Models (LLMs) offer remarkable scalability through sparse expert activation, a paradigm promising for memory bandwidth-intensive Compute-In-Memory (CIM) architectures. However, the practical deployment of these models on analog CIM systems is hampered by inherent hardware imperfections. This work presents the first systematic investigation into the impact of MoE LLM hardware noise, revealing that these imperfections critically disrupt expert load balance and render standard routing decisions suboptimal.

Visual TL;DR. MoE LLMs on CIM leads to Hardware Noise. Hardware Noise leads to Routing Suboptimality. Routing Suboptimality causes Performance Degradation. ROMER Framework enables Restores Load Balance. ROMER Framework enables Stabilizes Routing. Restores Load Balance shows Broad Generalizability. Stabilizes Routing shows Broad Generalizability.

Related startups

  1. MoE LLMs on CIM: sparse expert activation promising for memory bandwidth-intensive compute-in-memory architectures
  2. Hardware Noise: analog CIM hardware imperfections perturb stored weights, disrupting expert load balance
  3. Routing Suboptimality: standard routing decisions become consistently suboptimal in the presence of hardware noise
  4. Performance Degradation: critical disruption in load balance and suboptimal routing directly impacting model performance
  5. ROMER Framework: a novel calibration framework for noisy MoE deployments on analog CIM systems
  6. Restores Load Balance: significantly improves accuracy by restoring expert load balance and stabilizing routing
  7. Stabilizes Routing: significantly improves accuracy by restoring expert load balance and stabilizing routing
  8. Broad Generalizability: demonstrates broad generalizability across various MoE models and hardware noise levels
Visual TL;DR
Visual TL;DR — startuphub.ai MoE LLMs on CIM leads to Hardware Noise. Hardware Noise leads to Routing Suboptimality. ROMER Framework enables Restores Load Balance leads to enables MoE LLMs on CIM Hardware Noise Routing Suboptimality ROMER Framework Restores Load Balance From startuphub.ai · The publishers behind this format
Visual TL;DR — startuphub.ai MoE LLMs on CIM leads to Hardware Noise. Hardware Noise leads to Routing Suboptimality. ROMER Framework enables Restores Load Balance leads to enables MoE LLMs on CIM Hardware Noise RoutingSuboptimality ROMER Framework Restores LoadBalance From startuphub.ai · The publishers behind this format
Visual TL;DR — startuphub.ai MoE LLMs on CIM leads to Hardware Noise. Hardware Noise leads to Routing Suboptimality. ROMER Framework enables Restores Load Balance leads to enables MoE LLMs on CIM sparse expert activation promising formemory bandwidth-intensivecompute-in-memory architectures Hardware Noise analog CIM hardware imperfections perturbstored weights, disrupting expert loadbalance Routing Suboptimality standard routing decisions becomeconsistently suboptimal in the presence ofhardware noise ROMER Framework a novel calibration framework for noisyMoE deployments on analog CIM systems Restores Load Balance significantly improves accuracy byrestoring expert load balance andstabilizing routing From startuphub.ai · The publishers behind this format
Visual TL;DR — startuphub.ai MoE LLMs on CIM leads to Hardware Noise. Hardware Noise leads to Routing Suboptimality. ROMER Framework enables Restores Load Balance leads to enables MoE LLMs on CIM sparse expertactivationpromising for… Hardware Noise analog CIM hardwareimperfectionsperturb stored… RoutingSuboptimality standard routingdecisions becomeconsistently… ROMER Framework a novel calibrationframework for noisyMoE deployments on… Restores LoadBalance significantlyimproves accuracyby restoring expert… From startuphub.ai · The publishers behind this format
Visual TL;DR — startuphub.ai MoE LLMs on CIM leads to Hardware Noise. Hardware Noise leads to Routing Suboptimality. Routing Suboptimality causes Performance Degradation. ROMER Framework enables Restores Load Balance. ROMER Framework enables Stabilizes Routing. Restores Load Balance shows Broad Generalizability. Stabilizes Routing shows Broad Generalizability leads to causes enables enables shows shows MoE LLMs on CIM sparse expert activation promising formemory bandwidth-intensivecompute-in-memory architectures Hardware Noise analog CIM hardware imperfections perturbstored weights, disrupting expert loadbalance Routing Suboptimality standard routing decisions becomeconsistently suboptimal in the presence ofhardware noise Performance Degradation critical disruption in load balance andsuboptimal routing directly impactingmodel performance ROMER Framework a novel calibration framework for noisyMoE deployments on analog CIM systems Restores Load Balance significantly improves accuracy byrestoring expert load balance andstabilizing routing Stabilizes Routing significantly improves accuracy byrestoring expert load balance andstabilizing routing Broad Generalizability demonstrates broad generalizability acrossvarious MoE models and hardware noiselevels From startuphub.ai · The publishers behind this format
Visual TL;DR — startuphub.ai MoE LLMs on CIM leads to Hardware Noise. Hardware Noise leads to Routing Suboptimality. Routing Suboptimality causes Performance Degradation. ROMER Framework enables Restores Load Balance. ROMER Framework enables Stabilizes Routing. Restores Load Balance shows Broad Generalizability. Stabilizes Routing shows Broad Generalizability leads to causes enables enables shows shows MoE LLMs on CIM sparse expertactivationpromising for… Hardware Noise analog CIM hardwareimperfectionsperturb stored… RoutingSuboptimality standard routingdecisions becomeconsistently… PerformanceDegradation critical disruptionin load balance andsuboptimal routing… ROMER Framework a novel calibrationframework for noisyMoE deployments on… Restores LoadBalance significantlyimproves accuracyby restoring expert… StabilizesRouting significantlyimproves accuracyby restoring expert… BroadGeneralizability demonstrates broadgeneralizabilityacross various MoE… From startuphub.ai · The publishers behind this format

Noise-Induced Routing Suboptimality in MoE Architectures

The researchers observed that analog CIM hardware imperfections, calibrated with real chip measurements, significantly perturb stored weights. This perturbation leads to a critical disruption in the load balance of MoE experts. Consequently, the routing decisions made by clean-trained models become consistently suboptimal in the presence of this hardware noise, directly impacting model performance and efficiency.

ROMER: A Novel Calibration Framework for Noisy MoE Deployments

To address these challenges, the paper introduces ROMER, a post-training calibration framework. ROMER employs two key strategies: it replaces underactivated experts with high-frequency ones to restore expert load balance and recalibrates router logits via percentile-based normalization to stabilize routing under noisy conditions. This approach demonstrably mitigates the performance degradation caused by MoE LLM hardware noise.

Broad Generalizability Across MoE Models

Extensive experiments across multiple benchmarks showcase ROMER's effectiveness and generalizability. The framework achieves substantial perplexity reductions—up to 58.6%, 58.8%, and 59.8% for DeepSeek-MoE, Qwen-MoE, and OLMoE, respectively—under real-chip noise conditions. This highlights the critical need for robust calibration techniques when deploying advanced MoE LLM hardware noise-aware solutions on specialized hardware.

© 2026 StartupHub.ai. All rights reserved. Do not enter, scrape, copy, reproduce, or republish this article in whole or in part. Use as input to AI training, fine-tuning, retrieval-augmented generation, or any machine-learning system is prohibited without written license. Substantially-similar derivative works will be pursued to the fullest extent of applicable copyright, database, and computer-misuse laws. See our terms.