The race to make artificial intelligence truly useful just accelerated, as Google Cloud unveiled new hardware designed to power the next wave of AI applications. While much attention has focused on training massive models like Gemini, the real challenge—and opportunity—now lies in efficiently running them for everyday tasks. To meet this growing demand, Google is releasing Ironwood, its seventh-generation Tensor Processing Unit (TPU) boasting a 10x performance leap over previous versions, alongside new, cost-effective Arm-based Axion virtual machines. These advancements aren’t just about speed; they represent a fundamental shift towards optimized infrastructure capable of handling complex “agentic” workflows and delivering responsive, real-world AI experiences.
Ironwood TPUs: Next-Generation Performance for AI Inference
Ironwood TPUs represent Google Cloud’s seventh generation of Tensor Processing Units, purpose-built to accelerate AI inference workloads. Boasting a 10x peak performance improvement over TPU v5p and over 4x better performance per chip than TPU v6e, Ironwood unlocks significant gains for demanding applications like large language models. This leap in performance is achieved through advanced silicon design and a massive scale – individual superpods can connect up to 9,216 chips, linked by 9.6 Tb/s Inter-Chip Interconnect (ICI) and 1.77 Petabytes of shared HBM.
The architecture prioritizes not just speed, but reliability and cost-efficiency. Ironwood leverages Optical Circuit Switching (OCS) technology, dynamically rerouting workloads around interruptions for uninterrupted service availability. Scaling beyond individual superpods, clusters can encompass hundreds of thousands of TPUs. Google highlights a recent IDC report showing AI Hypercomputer (which includes TPUs) customers achieved a 353% three-year ROI, 28% lower IT costs, and 55% more efficient IT teams – demonstrating concrete business benefits.
Beyond hardware, Google’s co-designed software layer maximizes Ironwood’s potential. New features within Google Kubernetes Engine, like Cluster Director, improve fleet efficiency. Enhancements to open-source frameworks like MaxText and vLLM, coupled with GKE Inference Gateway, further streamline AI workflows. The Gateway, for example, reduces time-to-first-token latency by up to 96% and lowers serving costs by 30%, proving that optimized software is crucial for unlocking Ironwood’s full capability.
New Axion VMs: Cost-Effective Arm-Based Compute
Google Cloud recently launched new Axion VMs, representing a cost-effective shift towards Arm-based compute for inference workloads. Specifically, the N4A instance offers up to 2x better price-performance compared to current-generation x86 VMs. This is crucial as organizations increasingly focus on serving trained AI models – powering responsive interactions – rather than solely on training. Axion provides a compelling option for scaling inference without dramatically increasing infrastructure costs, addressing a key need in the rapidly evolving AI landscape.
The launch of Axion instances complements Google’s introduction of seventh-generation Ironwood TPUs. While TPUs excel at demanding training and complex AI tasks, Axion VMs target broader inference needs and agentic workflows. The combination provides a tiered approach, allowing customers to optimize resource allocation based on specific workload requirements. Google’s strategy emphasizes system-level co-design, integrating custom silicon with software for maximized performance and efficiency – a core tenet of their AI Hypercomputer.
Beyond virtual machines, Google is previewing C4A metal – their first Arm-based bare metal instance. This offers even greater performance and control for applications demanding maximum resource access. The entire Axion portfolio, coupled with advancements in TPU interconnectivity (9.6 Tb/s) and massive shared memory (1.77 PB), demonstrates Google’s commitment to providing a versatile and scalable infrastructure for the “age of inference” and the growing demands of AI-powered applications.
System-Level Design for Scalable AI Workloads
Google’s new Ironwood TPUs represent a significant leap in system-level design for AI inference workloads. Achieving a 10x peak performance improvement over TPU v5p and 4x better performance per chip than TPU v6e, Ironwood isn’t just about faster chips. The design focuses on massive interconnectivity; a superpod can scale to 9,216 chips linked by 9.6 Tb/s Inter-Chip Interconnect (ICI) and 1.77 Petabytes of shared HBM. This addresses data bottlenecks critical for large models and demanding agentic workflows, moving beyond isolated chip performance.
Beyond hardware, Google emphasizes co-design with software. The AI Hypercomputer integrates Ironwood with innovations like Optical Circuit Switching (OCS) for uninterrupted service and Cluster Director within Google Kubernetes Engine for resilient TPU clusters. Software tools like MaxText (for training) and vLLM/GKE Inference Gateway (for inference) further optimize performance, reducing latency by up to 96% and serving costs by 30%. This holistic approach is vital for maximizing the value of powerful hardware.
The impact is demonstrable, with Anthropic planning to deploy up to 1 million TPUs and Lightricks anticipating benefits for generative AI workflows. IDC reports indicate that AI Hypercomputer customers already see a 353% three-year ROI, 28% lower IT costs, and 55% more efficient IT teams. Ironwood, coupled with Google’s system-level approach, aims to push these gains even further, enabling scalable, cost-effective AI deployments at a planetary scale.
AI Hypercomputer: Integrated System for Peak Efficiency
Google’s newly available Ironwood TPUs represent a significant leap in AI compute, delivering a 10x peak performance improvement over TPU v5p. This seventh-generation chip is purpose-built for demanding tasks like large-scale model training and high-volume inference. Crucially, Ironwood offers over 4x better performance per chip compared to the previous TPU v6e, enhancing both training and inference efficiency. This isn’t just about raw speed; it’s about reducing costs and enabling more complex AI workflows, particularly those leveraging agentic systems.
The power of Ironwood is amplified within Google’s “AI Hypercomputer” – an integrated system uniting compute, networking, storage, and software. Scaling to 9,216 chips interconnected with 9.6 Tb/s networking and 1.77 Petabytes of HBM, this system overcomes data bottlenecks. Optical Circuit Switching (OCS) ensures 99.999% uptime, dynamically rerouting around failures. IDC reports customers see a 353% three-year ROI, 28% lower IT costs, and 55% more efficient IT teams using this integrated approach.
Beyond hardware, Google emphasizes co-designed software. New features within Google Kubernetes Engine (Cluster Director) optimize TPU fleet management and resilience. Enhancements to MaxText (LLM framework) and vLLM/GKE Inference Gateway streamline training and inference. GKE Inference Gateway, for example, reduces time-to-first-token latency by up to 96% and serving costs by 30%, showcasing how integrated hardware and software unlock peak AI performance.
Software Enhancements: Optimizing the AI Lifecycle
Google’s new Ironwood TPUs represent a significant leap in AI infrastructure, delivering a 10x peak performance improvement over TPU v5p. This seventh-generation chip is purpose-built for demanding tasks like large-scale model training and high-volume inference. Crucially, Ironwood boosts performance per chip by over 4x compared to the TPU v6e, alongside improved energy efficiency. This matters because it directly translates to faster model deployment, reduced operational costs, and the ability to handle increasingly complex AI workloads, particularly agentic systems.
Beyond the chips themselves, Google is optimizing the entire AI lifecycle with its AI Hypercomputer. Ironwood superpods can scale to 9,216 chips interconnected with 9.6 Tb/s networking, accessing 1.77 Petabytes of shared HBM. This massive scale is paired with innovations like Optical Circuit Switching (OCS) for near-uninterrupted service and Cluster Director within Google Kubernetes Engine, improving fleet efficiency. The result, according to IDC, is a potential 353% three-year ROI for Hypercomputer customers.
Software enhancements are equally critical. Google is expanding support for frameworks like vLLM and integrating TPUs with GKE Inference Gateway—reducing time-to-first-token latency by up to 96% and lowering serving costs by 30%. New features within MaxText also streamline LLM training and optimization techniques. This co-design approach—hardware and software—is central to Google’s strategy for delivering faster, more efficient AI outcomes across the entire lifecycle.
Customer Impact: Real-World Results with Ironwood
Ironwood TPUs represent a significant leap in AI infrastructure, delivering up to a 10x peak performance improvement over TPU v5p and 4x better performance per chip than TPU v6e. This translates directly into real-world benefits for customers like Anthropic, who plan to utilize up to 1 million TPUs to accelerate both training and inference for their Claude models. The increased performance allows them to efficiently scale to meet exponentially growing demand while maintaining reliability – a crucial factor for serving millions of users.
Organizations are already seeing impactful results. Lightricks reports Ironwood’s potential to enhance the fidelity of their image and video generation for millions of customers, building on their successful TPU-powered training efficiency with LTX-2. Essential AI highlights the ease of onboarding and immediate access to Ironwood’s power, allowing engineers to focus on AI breakthroughs. These early adopters demonstrate Ironwood’s versatility across diverse applications and company sizes.
Beyond individual chip performance, Ironwood’s system-level design maximizes efficiency. Scaling to 9,216 chips interconnected via 9.6 Tb/s ICI networking and 1.77 Petabytes of HBM overcomes data bottlenecks. Coupled with features like Optical Circuit Switching for uninterrupted availability and integration with Google Kubernetes Engine for resilient clusters, Ironwood, as part of the AI Hypercomputer, delivers a reported 353% three-year ROI and 28% lower IT costs for customers.
