** Microsoft Azure has just pulled back the curtain on a monumental leap in AI infrastructure: the world’s first production-scale NVIDIA GB300 NVL72 supercomputing cluster. This isn't just another server farm; it's a purpose-built behemoth designed to tackle OpenAI’s most demanding AI inference workloads, signaling a new frontier for reasoning models and agentic AI systems.
According to the announcement, this supercomputer-scale cluster boasts over 4,600 NVIDIA Blackwell Ultra GPUs, all interconnected via the cutting-edge NVIDIA Quantum-X800 InfiniBand networking platform. Microsoft and NVIDIA’s years-long partnership has culminated in a radical engineering feat, optimizing memory and networking to deliver the sheer compute scale needed for high inference and training throughput. This isn't just about raw power; it's about making that power accessible and efficient for the next generation of AI.
Inside the GB300 Engine
At the core of Azure’s new NDv6 GB300 VM series lies the liquid-cooled, rack-scale NVIDIA GB300 NVL72 system. Each rack is an independent powerhouse, integrating 72 NVIDIA Blackwell Ultra GPUs and 36 NVIDIA Grace CPUs. This configuration creates a staggering 37 terabytes of fast memory and 1.44 exaflops of FP4 Tensor Core performance per VM, forming a massive, unified memory space crucial for complex multimodal generative AI and advanced reasoning tasks.
The performance gains are already evident. In recent MLPerf Inference v5.1 benchmarks, the NVIDIA GB300 NVL72 systems delivered record-setting results, showing up to 5x higher throughput per GPU on the 671-billion-parameter DeepSeek-R1 reasoning model compared to the previous Hopper architecture. This kind of performance is critical for the real-time, complex decision-making required by agentic AI.
Connecting these thousands of GPUs into a single, cohesive supercomputer required a sophisticated two-tiered networking architecture. Within each GB300 NVL72 rack, the fifth-generation NVIDIA NVLink Switch fabric provides 130 TB/s of direct bandwidth, effectively turning the entire rack into a unified accelerator. To scale beyond the rack, the cluster leverages the NVIDIA Quantum-X800 InfiniBand platform, offering 800 Gb/s of bandwidth per GPU across all 4,608 GPUs. This ensures seamless, low-latency communication essential for trillion-parameter-scale AI.
This achievement underscores Microsoft Azure’s commitment to building the foundational infrastructure for future AI breakthroughs. By deploying the world’s first production NVIDIA GB300 NVL72 cluster at this scale, Azure is not just offering more compute; it's redefining what's possible for AI development, directly empowering partners like OpenAI to push the boundaries of what AI can do.
**
Related Reading
- [Blackwell AI Inference: NVIDIA's Extreme-Scale Bet](https://www.startuphub.ai/ai-news/ai-research/2025/blackwell-ai-inference-nvidias-extreme-scale-bet/) - [NVIDIA's Open Play for Agentic AI with Nemotron](https://www.startuphub.ai/ai-news/ai-research/2025/nvidias-open-play-for-agentic-ai-with-nemotron/) - [UK Sovereign AI Boosts Welsh Language, Public Services](https://www.startuphub.ai/ai-news/ai-research/2025/uk-sovereign-ai-boosts-welsh-language-public-services/)



