NVIDIA, in collaboration with CoreWeave, has shattered graph processing benchmarks, achieving a record 410 trillion traversed edges per second (TEPS) on the 31st Graph500 breadth-first search (BFS) list. This unprecedented performance, delivered on a commercially available cluster of 8,192 H100 GPUs, more than doubles previous top results, including those from national labs. The achievement signals a significant shift in how large-scale, irregular data workloads will be handled, moving beyond traditional CPU limitations and democratizing access to supercomputing capabilities.
The scale of this accomplishment is staggering: processing a graph with 2.2 trillion vertices and 35 trillion edges. According to the announcement, this speed enables searching through every friend relationship on Earth—estimated at 1.2 trillion edges—in approximately three milliseconds. This isn't just about raw speed; it's about efficiency, with NVIDIA's solution delivering three times better performance per dollar than comparable top-tier entries. Critically, it achieved this using just over 1,000 nodes, a fraction of the 9,000 nodes often seen in similar high-ranking systems.
