NVIDIA Supercharges AI Supercomputer with xAI Collaboration

According to NVIDIA, the xAI’s Colossus supercomputer cluster, a behemoth with 100K NVIDIA Hopper GPUs, has leveraged the cutting-edge NVIDIA Spectrum-X™ Ethernet networking platform to achieve unparalleled scale.

Located in Memphis, Tennessee, Colossus stands as the largest AI supercomputer globally, employed in training xAI’s Grok language models, particularly offering chatbots for X Premium subscribers. xAI intends to expand this giant to 200K GPUs. Constructed in just 122 days, Colossus dramatically cut down the typical build time from months or even years, initiating its first model training in just 19 days post the first rack's installation.

Colossus’ prowess in training the sizable Grok model manifests in its incredible network performance, exhibiting zero latency or packet loss across its network fabric's tiers, thanks to the Spectrum-X congestion control, maintaining an impressive 95% data throughput.

The Colossus setup utilizes the Spectrum SN5600 Ethernet switch paired with NVIDIA BlueField-3® SuperNICs, pushing AI training boundaries on an unprecedented scale with enhanced performance.

Spectrum-X’s advanced features, once solely available with InfiniBand, now afford Colossus scalable bandwidth with minimal latency, enhanced by adaptive routing, NVIDIA’s Direct Data Placement technology, and overall fabric visibility and isolation crucial for generative AI clouds and large enterprises.

The Colossus setup utilizes the Spectrum SN5600 Ethernet switch paired with NVIDIA BlueField-3® SuperNICs, pushing AI training boundaries on an unprecedented scale with enhanced performance.

NVIDIA Supercharges AI Supercomputer with xAI Collaboration

AI Daily Digest

NVIDIA Supercharges AI Supercomputer with xAI Collaboration

AI Daily Digest