NVIDIA H100s Redefine AI Graph Processing Performance

NVIDIA, in collaboration with CoreWeave, has shattered graph processing benchmarks, achieving a record 410 trillion traversed edges per second (TEPS) on the 31st Graph500 breadth-first search (BFS) list. This unprecedented performance, delivered on a commercially available cluster of 8,192 H100 GPUs, more than doubles previous top results, including those from national labs. The achievement signals a significant shift in how large-scale, irregular data workloads will be handled, moving beyond traditional CPU limitations and democratizing access to supercomputing capabilities.

The scale of this accomplishment is staggering: processing a graph with 2.2 trillion vertices and 35 trillion edges. According to the announcement, this speed enables searching through every friend relationship on Earth, estimated at 1.2 trillion edges, in approximately three milliseconds. This isn't just about raw speed; it's about efficiency, with NVIDIA's solution delivering three times better performance per dollar than comparable top-tier entries. Critically, it achieved this using just over 1,000 nodes, a fraction of the 9,000 nodes often seen in similar high-ranking systems.

Related startups

Graphs represent the underlying information structure for countless modern applications, from social networks to banking apps, capturing complex relationships in massive webs of data. These structures are inherently sparse and irregular, meaning connections vary widely and unpredictably, unlike the dense, structured data common in AI training. Graph500 BFS specifically measures a system's ability to navigate this irregularity at scale, a challenge traditionally bottlenecked by CPU architectures that struggle with constant data movement across compute nodes as graphs grow to trillions of edges.

Reimagining Graph Processing for GPUs

NVIDIA's breakthrough lies in a full-stack, GPU-only solution that fundamentally reengineers how data moves across the network, bypassing the CPU entirely for active messaging. This custom software framework leverages InfiniBand GPUDirect Async (IBGDA) and the NVSHMEM parallel programming interface, enabling direct GPU-to-GPU communication with the InfiniBand network interface card. Message aggregation has been redesigned from the ground up to support hundreds of thousands of GPU threads sending active messages simultaneously, fully exploiting the H100's massive parallelism and memory bandwidth.

This advancement extends far beyond benchmark lists, holding massive implications for high-performance computing (HPC) and the future of NVIDIA AI graph processing. Fields like fluid dynamics, weather forecasting, and cybersecurity, which rely heavily on similar sparse data structures and communication patterns, have long been tethered to CPU-centric systems at the largest scales. The validation of this GPU-accelerated approach on commercially available infrastructure means developers can now efficiently scale their largest HPC applications, bringing supercomputing performance to a broader range of industries.

NVIDIA's Graph500 triumph, achieved through a cohesive orchestration of its full-stack compute, networking, and software technologies, marks a pivotal moment for large-scale data processing. It demonstrates that the NVIDIA computing platform is now ready to accelerate the world's largest sparse, irregular workloads with unprecedented efficiency and cost-effectiveness. This shift will empower developers across diverse industries to tackle previously intractable problems, fundamentally altering the landscape of scientific discovery and advanced analytics.

© 2025 StartupHub.ai. All rights reserved. Do not enter, scrape, copy, reproduce, or republish this article in whole or in part. Use as input to AI training, fine-tuning, retrieval-augmented generation, or any machine-learning system is prohibited without written license. Substantially-similar derivative works will be pursued to the fullest extent of applicable copyright, database, and computer-misuse laws. See our terms.

NVIDIA H100s Redefine AI Graph Processing Performance

Related startups

Reimagining Graph Processing for GPUs

AI Daily Digest