
– GPU: 72 Rubin GPUs (equipped with HBM4 memory, 288GB per GPU, Transformer Engine supported)
– Superchip unit: 1 Vera CPU + 2 Rubin GPUs combined
– Other chips: Extreme co-design of 6 chips, including NVLink 6 switch, ConnectX-9 SuperNIC, BlueField-4 DPU, Spectrum-6 Ethernet switch, etc.
– Inference: 5x improvement (3.6 EFLOPS based on NVFP4, 50 PFLOPS per GPU)
– Training: 3.5 times improvement (2.5 EFLOPS based on NVFP4)
– Cost per token: 1/10th of the MoE model (significantly reduces inference costs)
– MoE model training: Reduces number of GPUs required by 1/4
– Memory:
HBM4 20.7TB + LPDDR5X 54TB
– Bandwidth: 3.6 TB/s per GPU with NVLink 6, 260 TB/s across rack (exceeding the entire Internet bandwidth)
HBM4, which consists of multiple layers of RAM, is used in units of ‘tera’.
Due to the system structure, 1 GPU is 288GB,
It would be even more surprising if there wasn’t enough RAM.