Tom Lubowe, Benedikt Kloss, Danylo Lykov, Tyson Jones, and Daniel Lowell of NVIDIA have introduced cuQuantum SDK v25.11, a set of high-performance libraries and tools designed to accelerate both circuit- and device-level quantum computing simulations by orders of magnitude. This SDK features components that accelerate Pauli propagation and stabilizer simulations—critical methods for simulating large-scale quantum computers—and leverages NVIDIA GPUs to provide this acceleration. The development of cuQuantum addresses the growing need for validation of quantum processing unit (QPU) results and the generation of training data for AI models aimed at improving quantum processor operation, including AI-driven error correction and device design.
cuQuantum SDK Overview
The cuQuantum SDK, specifically version 25.11, introduces new components to accelerate Pauli propagation and stabilizer simulations—critical techniques for simulating large-scale quantum computers. This is increasingly important as the quality of quantum processing units (QPUs) improves and validation of results becomes key when devices exceed classical simulation capabilities. The SDK aims to provide accelerated training data for AI models used in quantum processor operation, including error correction and calibration.
Pauli propagation efficiently simulates quantum circuit observables by dynamically discarding insignificant terms, allowing for estimation of otherwise intractable quantities. The cuQuantum 25.11 release offers primitives to accelerate these methods on NVIDIA GPUs. Library initialization requires allocating GPU memory—an example provided allocates 64 MiB—and defining observables as packed integers representing Pauli operators and their coefficients.
Simulation within the SDK involves creating Pauli expansions, applying operators (like Pauli rotations) in the Heisenberg picture, and ultimately computing expectation values. Results demonstrate significant speedups—multiple orders of magnitude over single-threaded Qiskit Pauli-Prop on dual-socket CPUs—particularly with small coefficient cutoffs, showcasing the performance benefits of NVIDIA DGX B200 GPUs.
Pauli Propagation Explained
Pauli propagation is a new method for efficiently simulating large-scale quantum circuits, even those including noise. It works by expressing states and observables as weighted sums of Pauli tensor products and dynamically discarding insignificant terms, allowing for the estimation of quantities otherwise impossible to calculate with exact simulation. This technique is particularly useful for computing expectation values, critical for applications like Variational Quantum Eigensolver (VQE) and quantum simulation of physical dynamics.
This method proves a useful addition to existing approximate circuit simulation techniques, especially for near-Clifford or very noisy circuits. Recent findings show impressive performance when simulating circuits which represent the evolution of quantum spin systems, and even in circuits used in IBM’s 127-qubit utility experiment. Characterizing which circuits benefit most from Pauli propagation remains an active area of research alongside refinement of the algorithmic details.
NVIDIA’s cuQuantum SDK v25.11 now accelerates Pauli propagation simulations using GPUs. Benchmarks demonstrate significant speedups – multiple orders of magnitude faster than single-threaded Qiskit Pauli-Prop on dual-socket CPUs – particularly with small coefficient cutoffs. This is achieved through library primitives for initializing the library, defining observables, applying operators, and computing expectation values, allowing researchers to advance classical circuit simulation.
Library Initialization and Setup
The cuQuantum SDK v25.11 introduces tools for accelerating Pauli propagation and stabilizer simulations, critical for simulating large-scale quantum computers. Initialization requires creating a library handle and workspace descriptor, allocating GPU memory – an example shows 64 MiB assigned – and associating it with the workspace. Core functions enable developers and researchers to advance classical circuit simulation, especially as QPUs improve and scale beyond classical simulability.
To begin simulations, device memory must be allocated for Pauli expansions – sums of Pauli operators expressed as unsigned integers and coefficients. The source details encoding Pauli strings into packed integers, defining observable terms, and creating a Pauli expansion using functions like cupauliprop.create_pauli_expansion. A specific example initializes an observable (Z_62) and assigns it to the first term in the expansion, demonstrating the initial setup.
Applying operators – like Pauli rotations – evolves the system. The source highlights that most applications operate in the Heisenberg picture, requiring gates to be applied in reverse order to the observable with the adjoint argument set to True. Functions like cupauliprop.pauli_expansion_view_compute_operator_application handle this process, and expectation values are ultimately computed via the trace with the zero state.
These workloads are critical for quantum error correction, verification and validation, and algorithm engineering for intermediate to large scale quantum devices.
Defining and Applying Operators
The cuQuantum SDK v25.11 introduces tools to accelerate Pauli propagation and stabilizer simulations, critical for simulating large-scale quantum computers. Pauli propagation efficiently simulates quantum circuit observables, even with noise, by dynamically discarding insignificant terms. This method expresses states and observables as weighted sums of Pauli tensor products, allowing for estimation of otherwise intractable experimental quantities. It’s a useful addition to existing circuit simulation techniques, especially for near-Clifford and noisy circuits.
To initiate a Pauli propagation simulation, the SDK requires allocation of device memory for Pauli expansions – sets of unsigned integers and their coefficients. An initial observable is then defined by encoding Pauli strings into packed integers. The library can then create Pauli expansions and apply operators, such as Pauli rotations, to evolve the system. These steps enable the simulation of quantum circuits in the Heisenberg picture, where operators are applied to the observable rather than the state.
Finally, cuQuantum allows for the computation of expectation values, like the trace with the zero state, to determine simulation results. NVIDIA DGX B200 GPUs demonstrate significant speedups—multiple orders of magnitude over single-threaded Qiskit Pauli-Prop—when simulating with small coefficient cutoffs. This performance boost highlights the SDK’s potential for advancing classical circuit simulation and validating large-scale quantum computer outputs.
Users can get a 1,060x speedup for large code distances.
Observable and Expansion Creation
cuQuantum SDK v25.11 introduces tools to accelerate Pauli propagation and stabilizer simulations – critical methods for simulating large-scale quantum computers. Pauli propagation efficiently simulates observables by expressing states and observables as weighted sums of Pauli tensor products, dynamically discarding insignificant terms. This allows for estimation of experimental quantities otherwise intractable for exact simulation, and is particularly useful for circuits encoding two or three-dimensional physical systems, complementing existing tensor network methods.
The cuQuantum library enables developers to accelerate Pauli propagation on NVIDIA GPUs. Initialization involves allocating GPU memory for Pauli expansions – sets of unsigned integers and their coefficients – and defining an observable. The process includes encoding Pauli strings into packed integers, allocating device buffers, and creating input and output Pauli expansions, ultimately enabling the simulation of circuits beyond the capabilities of classical computation.
Finally, cuQuantum facilitates expectation value computation via methods like trace calculation with the zero state. Combining these techniques, NVIDIA DGX B200 GPUs demonstrate significant speedups over CPU-based codes, with multiple order-of-magnitude improvements observed in Pauli-Prop simulations compared to single-threaded Qiskit on dual-socket CPUs, particularly with small coefficient cutoffs.
Expectation Value Computation
Pauli propagation is a relatively new method for efficiently simulating the observables of large-scale quantum circuits, even those including noise. By representing states and observables as weighted sums of Pauli tensor products, the technique dynamically discards insignificant terms, enabling the estimation of otherwise intractable experimental quantities. This is crucial for applications like Variational Quantum Eigensolver (VQE) and simulating physical dynamics, offering a complementary approach to methods like Matrix Product States, especially for circuits encoding two or three-dimensional physical systems.
The cuQuantum SDK v25.11 introduces tools to accelerate Pauli propagation simulations on NVIDIA GPUs. Developers can initialize libraries, define observables using packed integers representing Pauli operators, and create Pauli expansions. This involves allocating device memory for coefficients and Pauli strings, then populating the initial expansion with an observable. Operators, like Pauli rotations, can be defined and applied to the expansion, evolving the system in the Heisenberg picture by applying gates in reverse order.
Finally, expectation values are computed using the trace with the zero state. The source highlights significant performance gains; simulations utilizing NVIDIA DGX B200 GPUs demonstrate multiple order-of-magnitude speedups compared to single-threaded Qiskit Pauli-Prop on dual-socket CPUs, particularly with small coefficient cutoffs. This acceleration is achieved by leveraging GPU-accelerated supercomputers for classical circuit simulation, advancing the frontier of quantum computing.
