The biggest cause of the current RAM shortage situation

– CPU: 36 Vera CPUs (88 core Arm-based Olympus custom core, supports up to 1.5TB of LPDDR5X memory)

– GPU: 72 Rubin GPUs (equipped with HBM4 memory, 288GB per GPU, Transformer Engine supported)

– Superchip unit: 1 Vera CPU + 2 Rubin GPUs combined

– Other chips: Extreme co-design of 6 chips, including NVLink 6 switch, ConnectX-9 SuperNIC, BlueField-4 DPU, Spectrum-6 Ethernet switch, etc.

– Inference: 5x improvement (3.6 EFLOPS based on NVFP4, 50 PFLOPS per GPU)

– Training: 3.5 times improvement (2.5 EFLOPS based on NVFP4)

– Cost per token: 1/10th of the MoE model (significantly reduces inference costs)

– MoE model training: Reduces number of GPUs required by 1/4

– Memory:

HBM4 20.7TB + LPDDR5X 54TB

– Bandwidth: 3.6 TB/s per GPU with NVLink 6, 260 TB/s across rack (exceeding the entire Internet bandwidth)

HBM4, which consists of multiple layers of RAM, is used in units of ‘tera’.

Due to the system structure, 1 GPU is 288GB,

It would be even more surprising if there wasn’t enough RAM.