Meta, formerly Facebook, is developing its AI infrastructure, including its first custom silicon chip for running AI models, a new AI-optimized data centre design, and the second phase of its 16,000 GPU supercomputer for AI research. These advancements will enable developing and deploying larger, more sophisticated AI models. Meta’s custom accelerator chip family, MTIA, will provide greater computing power and efficiency than CPUs. The next-gen data centre will support liquid-cooled AI hardware and a high-performance AI network. Meta’s Research SuperCluster AI supercomputer features 16,000 GPUs and aims to power augmented reality tools, content understanding systems, and real-time translation technology.
Meta’s Ambitious Plan for AI Infrastructure
Meta, formerly known as Facebook, is executing an ambitious plan to build the next generation of its AI infrastructure. This includes developing a custom silicon chip for running AI models, a new AI-optimized data center design, and the second phase of its 16,000 GPU supercomputer for AI research. These efforts will enable the company to develop larger, more sophisticated AI models and deploy them efficiently at scale. AI is already at the core of Meta’s products, enabling better personalization, safer and fairer products, and richer experiences while also helping businesses reach their target audiences.
MTIA: Meta’s Custom Accelerator Chip
Meta is developing an in-house custom accelerator chip family called MTIA (Meta Training and Inference Accelerator) that targets inference workloads. MTIA provides greater compute power and efficiency than CPUs and is customized for Meta’s internal workloads. By deploying both MTIA chips and GPUs, the company aims to deliver better performance, decreased latency, and greater efficiency for each workload.
Next-Generation AI-Optimized Data Center
Meta’s next-generation data center design will support its current products while enabling future generations of AI hardware for both training and inference. The new data center will be AI-optimized, supporting liquid-cooled AI hardware and a high-performance AI network connecting thousands of AI chips together for data center-scale AI training clusters. It will also be faster and more cost-effective to build, complementing other new hardware such as Meta’s first in-house-developed ASIC solution, MSVP, which is designed to power the constantly growing video workloads at the company.
Research SuperCluster (RSC) AI Supercomputer
Meta’s RSC, which the company believes is one of the fastest AI supercomputers in the world, was built to train the next generation of large AI models to power new augmented reality tools, content understanding systems, real-time translation technology, and more. It features 16,000 GPUs, all accessible across the 3-level Clos network fabric that provides full bandwidth to each of the 2,000 training systems.
The Benefits of an End-to-End Integrated Stack
Custom-designing much of its infrastructure enables Meta to optimize an end-to-end experience from the physical layer to the virtual layer to the software layer to the actual user experience. The company designs, builds, and operates everything from the data centers to the server hardware to the mechanical systems that keep everything running. By controlling the stack from top to bottom, Meta can customize it for its specific needs, such as collocating GPUs, CPUs, network, and storage to better support workloads. This approach will be increasingly important in the years ahead as the company focuses on delivering long-term value and impact to guide its infrastructure vision.
“Our artificial intelligence (AI) compute needs will grow dramatically over the next decade as we break new ground in AI research, ship more cutting-edge AI applications and experiences across our family of apps, and build our long-term vision of the metaverse.”
Meta is developing its next-generation AI infrastructure, including a custom silicon chip for running AI models, an AI-optimized data center design, and a 16,000 GPU supercomputer for AI research. These advancements will enable the development and deployment of larger, more sophisticated AI models, powering emerging opportunities in areas like generative AI and the metaverse.
- Meta (formerly Facebook) is developing its artificial intelligence (AI) infrastructure to support its long-term vision of the metaverse.
- The company has created its first custom silicon chip, MTIA (Meta Training and Inference Accelerator), for running AI models, offering greater compute power and efficiency than CPUs.
- Meta is also designing a next-generation, AI-optimized data center to support current products and future AI hardware for both training and inference. The data center will feature liquid-cooled AI hardware and a high-performance AI network.
- The company has built the Research SuperCluster (RSC) AI supercomputer, which has 16,000 GPUs and is believed to be one of the fastest AI supercomputers in the world. It will be used to train large AI models for augmented reality tools, content understanding systems, and real-time translation technology.
- Meta is using CodeCompose, a generative AI-based coding assistant, to make developers more productive throughout the software development lifecycle.
- By custom-designing its infrastructure, Meta aims to optimize the end-to-end experience and create a scalable foundation for emerging opportunities in generative AI and the metaverse.