NVIDIA Vera Rubin Delivers 10× Lower Cost Per Token for AI

Enterprise AI is rapidly moving beyond pilot programs as NVIDIA and Dell Technologies unveil advancements designed to lower the cost of running increasingly complex AI models. The Dell AI Factory, powered by the new NVIDIA Vera Rubin NVL72, is now delivering ten times lower cost per token for AI inference, a critical metric for managing expenses as AI workloads expand. Already, 5,000 enterprises including Lilly, Samsung, and Honeywell are utilizing this platform to deploy AI at scale. NVIDIA founder and CEO Jensen Huang described the current surge in demand as “going parabolic, utterly parabolic,” noting that tasks which once took months now take hours due to these advancements in computational power.

Dell AI Factory: NVIDIA Vera Rubin NVL72 for Agentic AI Inference

The demand for artificial intelligence processing power is increasing rapidly; NVIDIA CEO Jensen Huang characterized it as “going parabolic, utterly parabolic” during a presentation at Dell Technologies World, signaling a fundamental shift in the economics of AI inference. This reduction in expense is critical as token consumption is projected to grow by 3400 percent alongside the broader $3-4 trillion AI infrastructure investment anticipated in the coming years. Dell is actively enabling this growth; 5,000 enterprises, including major players like Lilly, Samsung, and Honeywell, are already leveraging Dell AI Factories powered by NVIDIA to move beyond pilot programs and into full-scale AI deployments. The Dell PowerEdge XE9812, built on the Vera Rubin NVL72, is a key component, offering significant cost advantages over previous generation NVIDIA Blackwell architecture for large-scale agentic AI inferencing.

Beyond the XE9812, Dell is also introducing the PowerEdge XE9880L, XE9885L, and XE9882L servers, the first Dell systems built on NVIDIA HGX Rubin NVL8, capable of supporting up to 144 GPUs per rack with fully liquid cooling and a ten times performance increase over HGX B200. Networking infrastructure is also receiving an upgrade with the Dell PowerSwitch portfolio featuring NVIDIA Quantum-X800 InfiniBand, utilizing liquid-cooled, co-packaged optics and NVIDIA Spectrum-6 Ethernet. The focus extends beyond raw compute power to encompass the entire AI stack. Dell’s PowerEdge M9822 and R9822 servers now incorporate NVIDIA Vera CPUs, purpose-built for agentic AI workloads. Huang stated that “Vera CPU has the highest single-threaded performance of any CPU in the world,” and boasts 1.2 terabytes per second memory bandwidth, completing agentic workloads 50 percent faster than traditional x86 processors.

This speed is particularly impactful for data-intensive tasks; Starburst, a new data engine within the Dell AI Data Platform with NVIDIA, achieves three times faster query throughput on the Vera CPU for large-scale SQL analytics. According to developers, “It has three times the memory bandwidth — as a result, Starburst, DuckDB, all these databases run incredibly fast, because the agents are pounding on the databases, so the CPU had better be super fast.” The Dell AI Factory is also prioritizing security and control, with 67 percent of AI workloads now running on-premises, at the edge, or in colocation facilities, and 88 percent of organizations running at least one AI workload outside the public cloud. This trend is being addressed through NVIDIA Confidential Computing, delivered in partnership with companies like Fortanix and Google, enabling secure deployment of frontier models without exposing intellectual property or sensitive data. The platform supports open-source models like NVIDIA Nemotron and Reflection’s AI, and Reflection’s open source frontier AI models are coming on premises, allowing customers to connect agents to internal data and workflows, ultimately aiming to create a secure and scalable AI infrastructure for the enterprise.

Dell PowerEdge XE9812 & HGX Rubin NVL8 Deliver Scalable Performance

The current demand for artificial intelligence infrastructure is reshaping data center design and pushing the boundaries of computational performance. While early AI deployments often relied on cloud-based solutions, a significant shift toward on-premises infrastructure is now underway, driven by concerns over data security, governance, and cost optimization. Dell’s recent AI adoption survey found that 67 percent of AI workloads are now running outside the cloud, highlighting a clear preference for greater control and reduced latency. This trend necessitates hardware capable of delivering not only raw processing power but also efficient scaling and integration within existing enterprise environments. Dell Technologies is responding with a series of updated servers, notably the PowerEdge XE9812, built upon the NVIDIA Vera Rubin NVL72. This configuration is designed to dramatically reduce the expense of agentic AI inferencing, achieving up to ten times lower cost per token compared to NVIDIA Blackwell.

Dell emphasizes a holistic approach, extending beyond compute to encompass networking with the new PowerSwitch portfolio featuring NVIDIA Quantum-X800 InfiniBand, and fully integrated PowerRack systems designed for optimized thermal management and power delivery. This integrated design aims to eliminate the complexities of component assembly and accelerate deployment times. The performance gains are not limited to GPU acceleration; Dell is also incorporating NVIDIA Vera CPUs into its PowerEdge M9822 and R9822 servers. Purpose-built for agentic AI, the Vera CPU excels at handling data pipelines and code workloads where sequential processing is critical. With 1.2 terabytes per second of memory bandwidth, the scale of this infrastructural build-out is considerable. Michael Dell highlighted that worldwide AI infrastructure spending could reach $3-4 trillion, with token consumption projected to grow by 3400 percent within the same timeframe.

Agents and Models on Premises – Securely Dell’s own AI adoption survey, cited from the keynote stage, found that 67% of AI workloads now run outside the cloud – on premises, on device, at the edge or in colocation – and that 88% of respondents are running at least one AI workload on premises.

NVIDIA Vera CPU Accelerates Data Pipelines and Enterprise Queries

Researchers at Honeywell are increasingly focused on bringing AI workloads closer to the point of operation, a shift facilitated by recent advancements in on-premises infrastructure. This move, driven by concerns over data security and latency, is being enabled by the introduction of the NVIDIA Vera CPU, which Dell Technologies is now integrating into its PowerEdge server lines. The Vera CPU isn’t simply a faster processor; it represents a fundamental rethinking of how data is processed for agentic AI, prioritizing speed and efficiency in data pipelines and enterprise queries. With 1.2 terabytes per second of memory bandwidth, the performance gains are particularly noticeable in data-intensive applications. The increased efficiency translates directly into cost savings, a critical factor as AI adoption scales. This focus on cost-effectiveness is occurring against a backdrop of explosive growth in the AI sector.

This surge in demand is driving the need for more efficient and scalable infrastructure solutions. The platform’s architecture is not limited to computational power; it encompasses the entire AI stack, including data management, security, and orchestration. Honeywell chief technology officer Suresh Venkatarayalu emphasized that the partnership with Dell and NVIDIA is about more than just infrastructure. “It’s the full AI stack,” he explained, highlighting the importance of a comprehensive solution for industrial AI use cases, digital twins, and automation. This layered approach, combining performance, efficiency, and security, positions the Dell AI Factory as a key enabler for the next wave of enterprise AI adoption, moving beyond pilot programs and into large-scale, production deployments.

There is a massive AI investment boom thats already underway, and a productivity boom is beginning, and in some companies, including ours.

On-Premises AI Security with NVIDIA Confidential Computing & Model Access

The escalating demand for on-premises AI solutions is driven by a fundamental need for data security and control, a shift increasingly prioritized by enterprises handling sensitive information. This move isn’t simply about location; it’s about mitigating risks associated with data breaches and intellectual property exposure in a rapidly evolving threat landscape. Central to this on-premises strategy is NVIDIA Confidential Computing, a suite of technologies designed to protect AI models and sensitive data while in use. This goes beyond traditional data-at-rest and data-in-transit encryption, addressing a critical vulnerability: the exposure of model weights and data during the inference process. This is particularly crucial as enterprises begin to deploy the most advanced and computationally intensive AI systems, which represent a significant investment and a valuable asset.

Google Distributed Cloud (GDC) with Gemini 3.0 is now available in preview on Dell PowerEdge XE9780 servers, secured by NVIDIA Blackwell, offering a private confidential computing environment for advanced AI. The benefits extend beyond security; NVIDIA’s advancements in hardware are also driving down costs and improving performance. “There is a massive AI investment boom that’s already underway, and a productivity boom is beginning, and in some companies, including ours,” said Michael Dell. “It’s the full AI stack.” The integration of tools like NVIDIA OpenShell, an open-source runtime for autonomous agents with security and privacy controls, further strengthens this ecosystem, enabling organizations to enforce corporate policies at the infrastructure layer. “We’ve now arrived at the era of useful AI, which is the reason why demand is going parabolic, utterly parabolic,” Huang concluded, highlighting the transformative potential of secure, on-premises AI deployments.

Demand Is Going Parabolic, Utterly Parabolic” dell technologies agent enterprise ai NVIDIA CEO Jensen Huang at Dell Technologies World: ‘Demand Is Going Parabolic, Utterly Parabolic’ Huang joined Dell CEO Michael Dell on stage Monday to unveil the latest updates to the Dell AI Factory with NVIDIA – delivering a full-stack platform for autonomous agents, from deskside workstations to data center racks.

Stay current. See today’s quantum computing news on Quantum Zeitgeist for the latest breakthroughs in qubits, hardware, algorithms, and industry deals.
Rusty Flint

Rusty Flint

Rusty is a quantum science nerd. He's been into academic science all his life, but spent his formative years doing less academic things. Now he turns his attention to write about his passion, the quantum realm. He loves all things Quantum Physics especially. Rusty likes the more esoteric side of Quantum Computing and the Quantum world. Everything from Quantum Entanglement to Quantum Physics. Rusty thinks that we are in the 1950s quantum equivalent of the classical computing world. While other quantum journalists focus on IBM's latest chip or which startup just raised $50 million, Rusty's over here writing 3,000-word deep dives on whether quantum entanglement might explain why you sometimes think about someone right before they text you. (Spoiler: it doesn't, but the exploration is fascinating)

Latest Posts by Rusty Flint: