Google has done something with its AI chips that seems like a technical footnote but actually signals a structural shift in the industry. At Google Cloud Next in Las Vegas this week, the company unveiled the eighth generation of its Tensor Processing Units (TPUs) as not one chip but two: the TPU 8t for training large frontier models, and the TPU 8i designed specifically for inference. It is the first time Google has formally split the line, and the reasoning tells you where the money is moving.
For most of the AI boom, the dominant computational challenge was training: the process of teaching a model on enormous datasets before it could do anything useful. Nvidia built its dominance on being the best hardware for that job. But training is a one-time cost per model. Inference, the process of actually running a model when a user interacts with it, happens billions of times a day and scales with every new user, every new application, every AI agent running in the background of a product. As the quality of frontier models has converged and the gap between leading labs has narrowed, the competitive pressure has shifted to who can run models most cheaply and quickly at scale.
Google's TPU 8i directly targets this. The chip makes a significant jump in high-bandwidth memory, which Google describes as solving the "memory wall": the gap between how fast a processor makes calculations and how fast it can access the data it needs. For AI agents, which require rapid, repeated inference calls to take actions in the world rather than just answer questions, that bottleneck matters enormously. Google Cloud CEO Thomas Kurian framed the two-chip split as "natural evolution," but Google's infrastructure chiefs were less understated: "AI is evolving from answering questions to reasoning and taking action."
Nvidia has not been standing still. The company struck a $20 billion licensing deal with inference chipmaker Groq late last year, and debuted a new inference-focused chip in March. Its CUDA software ecosystem remains a significant structural advantage: most AI developers build on it, and switching costs are high. KeyBanc analyst John Vinh reaffirmed an Overweight rating on Nvidia, citing CUDA as a competitive moat that Google's TPUs have not managed to erode.
Still, Google's position in the TPU market has strengthened considerably. Meta committed to a multibillion-dollar TPU procurement agreement. Anthropic, which uses Google Cloud infrastructure, expanded its TPU access to potentially one million chips. The chips are now available to PyTorch developers, which removes one of the historical friction points for adoption. Google has also reportedly tested on-premises TPU installations for corporate clients, suggesting it is starting to compete in the enterprise hardware space rather than purely through cloud access.
The interesting tension here is that Google both competes with Nvidia and depends on it. Google trains its Gemini models on its own TPUs, but it also sells access to Nvidia chips through Google Cloud and will offer Nvidia's next-generation Vera Rubin GPUs to customers. For Google, the chip strategy is partly about cost control and partly about positioning for the agents era: the company that controls the most efficient inference silicon controls a key variable in how cheaply the next generation of AI applications can be built and run. The question is whether Google's integration advantage, building chips and models in the same organisation, compounds fast enough to matter in a market Nvidia has owned for a decade.