Hardware • March 26, 2026

NVIDIA's Rubin Platform Arrives, and It Brings a Surprising Passenger: the CPU

By AI Daily Editorial • March 26, 2026

NVIDIA's GTC conference in San Jose this month was always going to be dominated by Rubin, the company's first fully co-designed six-chip AI platform, now in production. The numbers are big: up to a 10-times reduction in inference token cost and a 4-times reduction in the number of GPUs needed to train mixture-of-experts models compared to the previous Blackwell generation. Those figures, if they hold up at scale, represent a meaningful shift in the economics of running large models. But the more interesting thing Jensen Huang said at GTC was about CPUs.

CNBC reported ahead of the conference that Huang was preparing to make a case for the CPU's return to prominence in AI infrastructure. The headline framing was "renaissance," and it was not wrong. The argument is this: GPUs dominate training and bulk inference because they can parallelise matrix operations across thousands of cores. But agentic AI workflows look different from bulk inference. An agent coordinating multiple tasks, switching between tools, managing memory, and responding to events in real time has a workload profile that looks more like traditional sequential computing than like matrix multiplication. For that workload, the CPU's strength in single-threaded performance and low-latency branching logic matters again.

NVIDIA is not abandoning GPUs. It is arguing that the next generation of AI infrastructure needs both, integrated tightly, and that Rubin's six-chip design reflects that. The platform combines GPU compute with what NVIDIA calls an "extreme co-design" approach where the chips are optimised to work together rather than bolted together from separate supply chains. The Jetson T4000 module, aimed at edge and robotics deployments, runs on Blackwell architecture at 70 watts and delivers 4 times the performance of its predecessor, which matters for energy-constrained applications like autonomous vehicles and warehouse robots.

The CPU-renaissance framing also has a competitive dimension. Intel and AMD both make CPUs that dominate data-centre deployments outside the GPU cluster. If agentic AI workloads grow as fast as the industry expects, and if those workloads run better on CPU-heavy architectures, it creates an opening for Intel and AMD to reclaim some of the AI infrastructure market they have lost over the past three years. NVIDIA's move to position itself as the provider of tightly integrated CPU-GPU systems is partly defensive: it is saying that even if CPUs matter more, you should buy NVIDIA's CPUs alongside NVIDIA's GPUs, not Intel's alongside AMD's.

The deeper point worth watching is what the shift toward agentic workloads actually means for how AI infrastructure is provisioned. Training a large model requires massive parallel GPU clusters running for weeks. Inference at scale requires similar clusters running continuously. But an agentic system coordinating a complex task might run on much more heterogeneous infrastructure: a mix of cheap CPU compute for orchestration, GPU bursts for the heavy inference steps, and fast storage for memory retrieval. The data-centre build-out that has defined AI infrastructure investment since 2022 may look quite different in a few years, and Rubin is the first hardware platform explicitly designed with that transition in mind.

NVIDIA's Rubin Platform Arrives, and It Brings a Surprising Passenger: the CPU

Sources