Michael Kagan, CTO of Nvidia and co-founder of Mellanox, recently engaged in a candid discussion with Sonya Huang and Pat Grady at Sequoia’s Europe100 event, offering profound insights into Nvidia's meteoric rise as the architect of AI infrastructure. His commentary illuminated the pivotal role of the Mellanox acquisition in transforming Nvidia from a mere chip company into a full-stack AI platform powerhouse, capable of scaling computing capabilities far beyond the traditional confines of Moore's Law. This evolution, Kagan posited, is fundamentally driven by an exponential surge in AI workloads and an unprecedented reliance on advanced networking.
Kagan emphasized that the global demand for computing is growing exponentially, at a rate far exceeding the historical doubling every two years predicted by Moore's Law. "AI kicked in when GPU from graphic processing unit became general processing unit," he stated, marking a critical inflection point around 2010-2011. Since then, the performance requirements for AI models have accelerated dramatically, now demanding a 10x to 16x increase in performance annually, or roughly doubling every three months. This relentless pace necessitates innovation that extends beyond merely squeezing more transistors onto a single chip.
The core of Nvidia's strategy, amplified by Mellanox, lies in treating the entire AI data center as a single, cohesive unit of computing. This requires a two-pronged scaling approach: "scale up" and "scale out." Scale-up involves tightly integrating multiple GPUs into a larger, more powerful computing block, exemplified by Nvidia's NVLink technology. Kagan vividly described modern GPUs as "rack-size machines" that require a "forklift to lift it," underscoring that the basic building block for AI is no longer a tiny silicon chip but a complex system, complete with a software layer exposing CUDA as its API. This integration allows seamless scaling from a single GPU up to 72 units while maintaining a unified software interface.
Beyond these formidable single-node systems, "scale out" becomes imperative for handling truly massive AI workloads. This involves connecting hundreds of thousands, and eventually millions, of these high-performance GPU blocks into a unified fabric. Here, networking transcends its traditional role, becoming as critical as compute power itself. Kagan highlighted that for such distributed systems, raw bandwidth—or "hero numbers"—is insufficient. The true determinant of efficiency is *consistent* low latency and a narrow distribution of communication times across the network. If communication becomes a bottleneck, even the most powerful GPUs will sit idle, wasting time and energy.
To address these challenges, Nvidia has innovated deeply at multiple layers of the compute stack. The company's BlueField Data Processing Units (DPUs), a direct lineage from Mellanox technology, are designed to offload the operating system and infrastructure tasks from the main CPUs and GPUs. This isolation not only frees up valuable compute resources for AI workloads but also significantly reduces the attack surface, a critical security consideration in multi-tenant data center environments. Kagan noted that the ratio of compute chips to network chips is shifting dramatically, with a future likely seeing "2 compute chips to 5 network chips" to manage the immense data flow.
Related Reading
- AI's Infrastructure Bottleneck: The Looming Power Crisis for Data Centers
- Cerebras Redefines AI Compute with Wafer-Scale Innovation in Oklahoma
- Qualcomm’s Bold AI Inference Play Challenges NVIDIA Dominance
The evolution of AI workloads, particularly the rise of generative AI, further underscores the importance of this integrated approach. While traditional AI training was the dominant compute driver, generative AI now demands significant inference capabilities. A single prompt to a generative AI model can initiate a recursive process, generating token by token, each step requiring an inference. This translates into an inference demand that is "not less than training, it's actually even more," Kagan asserted. He also touched upon the philosophical implications of AI, viewing it as humanity's "spaceship of the mind," capable of helping us discover new laws of physics that we currently cannot even imagine. This vision underscores Nvidia's commitment to building platforms that push the boundaries of scientific discovery and human potential.
The strategic fusion of Nvidia's accelerated computing with Mellanox's high-speed, low-latency networking capabilities has enabled Nvidia to architect the foundational infrastructure for this new era of AI. The company's ability to innovate across hardware, software, and networking, creating entire computing units at an unprecedented scale, positions it uniquely to address the exponential demands of AI, transforming the very definition of compute efficiency.

