Google has announced the launch of Cloud TPU v5p, its most powerful AI accelerator to date, and AI Hypercomputer, a supercomputer architecture designed to boost efficiency across AI training, tuning, and serving. The TPU v5p is designed to train large AI models faster and more efficiently, while the AI Hypercomputer integrates performance-optimised hardware, open software, and flexible consumption models. Companies like Salesforce and Lightricks are already using these technologies, reporting significant improvements in training speed and hardware utilisation. The new technologies are the result of decades of research in AI and systems design.
“Leveraging the remarkable performance and ample memory capacity of Google Cloud TPU v5p, we successfully trained our generative text-to-video model without splitting it into separate processes. This optimal hardware utilization significantly accelerates each training cycle, allowing us to swiftly conduct a series of experiments. The ability to train our model quickly in each experiment facilitates rapid iteration, which is an invaluable advantage for our research team in this competitive field of generative AI.”
Yoav HaCohen, PhD, Core Generative AI Research Team Lead, Lightricks
Google’s New AI Accelerator and Supercomputer Architecture
Google has announced the launch of Cloud TPU v5p, its most powerful, scalable, and flexible AI accelerator to date. This development comes in response to the rapid evolution of Generative AI (gen AI) models, which have seen a tenfold increase in parameters annually over the past five years. These larger models, with hundreds of billions or even trillions of parameters, require extensive training periods, sometimes spanning months, even on the most specialized systems. Efficient AI workload management also necessitates a coherently integrated AI stack consisting of optimized compute, storage, networking, software, and development frameworks.
Cloud TPU v5p: A Powerful AI Accelerator
The Cloud TPU v5p is a significant upgrade from its predecessor, the Cloud TPU v5e. It offers 2.3X price performance improvements, making it Google’s most cost-efficient TPU to date. The TPU v5p pod composes together 8,960 chips over Google’s highest-bandwidth inter-chip interconnect (ICI) at 4,800 Gbps/chip in a 3D torus topology. Compared to TPU v4, TPU v5p features more than 2X greater FLOPS and 3X more high-bandwidth memory (HBM). It can train large LLM models 2.8X faster than the previous-generation TPU v4 and embedding-dense models 1.9X faster than TPU v42.
AI Hypercomputer: A Groundbreaking Supercomputer Architecture
Google is also introducing AI Hypercomputer, a supercomputer architecture that employs an integrated system of performance-optimized hardware, open software, leading ML frameworks, and flexible consumption models. Unlike traditional methods that often tackle demanding AI workloads through piecemeal, component-level enhancements, AI Hypercomputer employs systems-level codesign to boost efficiency and productivity across AI training, tuning, and serving.
Performance-Optimized Hardware and Open Software
AI Hypercomputer features performance-optimized compute, storage, and networking built over an ultrascale data center infrastructure. It leverages a high-density footprint, liquid cooling, and Google’s Jupiter data center network technology. The system enables developers to access performance-optimized hardware through the use of open software to tune, manage, and dynamically orchestrate AI training and inference workloads. It offers extensive support for popular ML frameworks such as JAX, TensorFlow, and PyTorch.
Flexible Consumption and Positive Customer Feedback
AI Hypercomputer offers a wide range of flexible and dynamic consumption choices. Customers like Salesforce and Lightricks are already training and serving large AI models with Google Cloud’s TPU v5p AI Hypercomputer and have reported considerable improvements in their training speed. Google DeepMind and Google Research have also observed 2X speedups for LLM training workloads using TPU v5p chips compared to the performance on the TPU v4 generation.
Google has long believed in the power of AI to help solve challenging problems. Until very recently, training large foundation models and serving them at scale was too complicated and expensive for many organizations. Today, with Cloud TPU v5p and AI Hypercomputer, Google is excited to extend the result of decades of research in AI and systems design with its customers, so they can innovate with AI faster, more efficiently, and more cost-effectively.
“We’ve been leveraging Google Cloud TPU v5p for pre-training Salesforce’s foundational models that will serve as the core engine for specialized production use cases, and we’re seeing considerable improvements in our training speed. In fact, Cloud TPU v5p compute outperforms the previous generation TPU v4 by as much as 2X. We also love how seamless and easy the transition has been from Cloud TPU v4 to v5p using JAX. We’re excited to take these speed gains even further by leveraging the native support for INT8 precision format via the Accurate Quantized Training (AQT) library to optimize our models.” – Erik Nijkamp, Senior Research Scientist, Salesforce
“In our early-stage usage, Google DeepMind and Google Research have observed 2X speedups for LLM training workloads using TPU v5p chips compared to the performance on our TPU v4 generation. The robust support for ML Frameworks (JAX, PyTorch, TensorFlow) and orchestration tools enables us to scale even more efficiently on v5p. With the 2nd generation of SparseCores we also see significant improvement in the performance of embeddings-heavy workloads. TPUs are vital to enabling our largest-scale research and engineering efforts on cutting edge models like Gemini.”
Jeff Dean, Chief Scientist, Google DeepMind and Google Research
Summary
“Google has announced the launch of Cloud TPU v5p and AI Hypercomputer, designed to handle the increasing demands of generative AI models. These new technologies aim to boost efficiency and productivity across AI training, tuning, and serving, offering improved performance, flexibility, and scalability.”
- Google has announced the launch of Cloud TPU v5p, its most powerful AI accelerator to date, and AI Hypercomputer, a supercomputer architecture designed for AI workloads.
- The Cloud TPU v5p is designed to train and serve AI-powered products like YouTube, Gmail, Google Maps, Google Play, and Android. It is also used to train and serve Gemini, Google’s most capable and general AI model.
- The AI Hypercomputer is an integrated system of performance-optimized hardware, open software, leading ML frameworks, and flexible consumption models. It is designed to boost efficiency and productivity across AI training, tuning, and serving.
- The TPU v5p can train large LLM models 2.8X faster than the previous-generation TPU v4 and is 4X more scalable than TPU v4 in terms of total available FLOPs per pod.
- The AI Hypercomputer features performance-optimized compute, storage, and networking built over an ultrascale data center infrastructure. It also supports popular ML frameworks such as JAX, TensorFlow, and PyTorch.
- Companies like Salesforce and Lightricks are already using Google Cloud’s TPU v5p AI Hypercomputer and have reported considerable improvements in their training speed.
- Google DeepMind and Google Research have observed 2X speedups for LLM training workloads using TPU v5p chips compared to the performance on their TPU v4 generation.
