Sunday, January 4, 2026

Nvidia GB200 NVL72 Provides Significant Performance Difference Compared to AMD

Nvidia GB200 NVL72 Provides Significant Performance Difference Compared to AMD

As the transition to the Mixture of Experts (MoE) architecture accelerates in the world of artificial intelligence, hardware competition in this field is also intensifying. A new analysis published by Signal65, based on SemiAnalysis's InferenceMAX benchmarks, revealed that Nvidia's Blackwell-based GB200 NVL72 rack systems offer a remarkable advantage over AMD's Instinct MI355X solutions. According to test results, Nvidia provides up to a 28x performance increase per GPU in MoE workloads.

AI models are rapidly evolving towards an MoE-centric structure because it makes resource utilization more efficient. In this approach, the model is divided into separate sub-networks called "experts," and only the relevant experts are executed for each query. However, as this structure scales, problems such as intense data transfer between nodes, latency, and bandwidth pressure increase. For this reason, "hyperscalers" like Amazon Web Services (AWS), Google Cloud, Microsoft Azure, and Oracle Cloud focus not only on raw performance but also on the performance-per-cost balance. According to Signal65's assessment, Nvidia GB200 NVL72 stands out as the solution that best provides this balance in the current landscape.

72 Chips and 30 TB Shared Memory

One of the striking points of the analysis is the technical details of how Nvidia creates this performance difference. The company adopts an approach called "Extreme Co-Design" to overcome MoE scaling bottlenecks. Under this strategy, 72 GB200 chips work in an integrated manner within a single rack system with 30 TB of high-speed shared memory. This architecture is stated to significantly reduce latencies. As a result, according to InferenceMAX data, Nvidia's Blackwell-based AI servers process 75 tokens per second per GPU, leaving similar cluster configurations of AMD MI355X systems far behind.

Nvidia also has a distinct advantage not only in performance but also in terms of total cost of ownership (TCO). Signal65, in its calculation referencing Oracle Cloud pricing, states that the relative cost per token for GB200 NVL72 racks drops to as low as 1/15th. This clearly demonstrates why Nvidia's hardware stack is so widely preferred by cloud providers and large-scale AI developers.

Of course, these figures do not represent the entirety of the competition between AMD and Nvidia. AMD's MI355X Instinct solutions continue to be an aggressive alternative, especially in dense and compressed environments, thanks to their high HBM3e memory capacity. However, in the current generation, Nvidia appears to have the upper hand specifically for MoE workloads. In the upcoming period, competition is expected to intensify further with new rack-scale solutions like Helios and Vera Rubin.

0 Comments: