Qualcomm Technologies, Inc. today announced the launch of its next-generation AI inference solutions for data centers: the Qualcomm® AI200 and AI250 chip-based accelerator cards and racks. Building on its NPU technology leadership, these solutions offer rack-scale performance and superior memory capacity for fast generative AI inference. The Qualcomm AI250 solution will debut with an innovative memory architecture delivering greater than 10x higher effective memory bandwidth and lower power consumption, as highlighted by Durga Malladi, SVP & GM, Technology Planning, Edge Solutions & Data Center at Qualcomm Technologies, Inc. These advancements mark a major leap forward in enabling scalable, efficient, and flexible generative AI across industries.
Qualcomm Advances Data Center AI with New Chip Solutions
Qualcomm Technologies, Inc. recently announced its next-generation AI inference solutions for data centers, including the Qualcomm AI200 and AI250 chip-based accelerator cards and racks. These new offerings build upon the company’s leadership in neural processing unit (NPU) technology, aiming to deliver significant improvements in rack-scale performance and memory capacity for fast generative AI inference. The company anticipates these solutions will enable scalable, efficient, and flexible generative AI deployments across various industries, addressing a growing demand for powerful AI infrastructure.
The Qualcomm AI250 solution debuts with an innovative memory architecture based on near-memory computing, promising over 10x higher effective memory bandwidth and lower power consumption. This advancement enables disaggregated AI inferencing, allowing for efficient hardware utilization while meeting customer performance and cost requirements. Meanwhile, the Qualcomm AI200 introduces a purpose-built rack-level AI inference solution, supporting 768 GB of LPDDR per card to deliver both higher memory capacity and reduced cost. According to Durga Malladi, SVP & GM, Technology Planning, Edge Solutions & Data Center at Qualcomm Technologies, Inc., these solutions redefine possibilities for rack-scale AI inference.
“With Qualcomm AI200 and AI250, we’re redefining what’s possible for rack-scale AI inference. These innovative new AI infrastructure solutions empower customers to deploy generative AI at unprecedented TCO, while maintaining the flexibility and security modern data centers demand”
SVP & GM Durga Malladi, Qualcomm Technologies, Inc.
Building on this, both rack solutions incorporate direct liquid cooling for thermal efficiency, PCIe for scalability, and Ethernet for scale-out capabilities. They also feature confidential computing for secure AI workloads and a rack-level power consumption of 160 kW. Qualcomm’s hyperscaler-grade AI software stack, optimized for AI inference from application to system layers, further supports integration and management of trained AI models. This optimized software, combined with compatibility for leading AI frameworks and one-click model deployment, is designed for frictionless adoption and rapid innovation, according to the company.
Next-Generation AI Inference: Qualcomm’s High-Performance, Efficient Chips
The Qualcomm AI250 solution represents a significant leap forward in AI inference efficiency, according to the company’s announcements. This innovative chip utilizes a near-memory computing architecture, delivering over 10x higher effective memory bandwidth compared to conventional systems. This advancement directly addresses a critical bottleneck in AI processing, enabling faster and more responsive AI applications while simultaneously reducing power consumption. The design allows for disaggregated AI inferencing, optimizing hardware utilization and aligning with customer cost and performance requirements.
Building on this, the AI250 and AI200 solutions both incorporate direct liquid cooling and support both PCIe for scaling up and Ethernet for scaling out. This combination provides flexibility for diverse data center configurations and workloads. Furthermore, confidential computing capabilities are integrated, enhancing security for sensitive AI workloads. Durga Malladi, SVP & GM, Technology Planning, Edge Solutions & Data Center at Qualcomm Technologies, Inc., highlighted how these solutions empower customers to deploy generative AI at an unprecedented total cost of ownership (TCO).
“Our rich software stack and open ecosystem support make it easier than ever for developers and enterprises to integrate, manage, and scale already trained AI models on our optimized AI inference solutions. With seamless compatibility for leading AI frameworks and one-click model deployment, Qualcomm AI200 and AI250 are designed for frictionless adoption and rapid innovation.”
SVP & GM Durga Malladi, Qualcomm Technologies, Inc.
Qualcomm’s hyperscaler-grade AI software stack is optimized for AI inference across the entire system, from the application to the system software layer. This end-to-end optimization, coupled with compatibility for leading AI frameworks and one-click model deployment, streamlines integration and accelerates innovation. The company emphasizes that this approach simplifies the process for developers and enterprises to integrate, manage, and scale already trained AI models on their optimized AI inference solutions, fostering a more accessible and efficient AI ecosystem.
Qualcomm Technologies’ AI200 and AI250 solutions represent a significant step towards more scalable and efficient AI inference. With innovations like near-memory computing and increased memory bandwidth, these rack solutions could enable disaggregated AI inferencing for a wider range of industries. For organizations managing large language and multimodal models, this translates to optimized performance and reduced total cost of ownership.
The advancements from Qualcomm Technologies extend beyond simple efficiency gains; they facilitate flexible deployment and secure AI workloads through features like direct liquid cooling and confidential computing. This development could enable faster, more accessible generative AI, ultimately impacting how businesses utilize and integrate artificial intelligence into their operations.
