Researchers are addressing the computational bottleneck of quantum simulations through a novel approach to sample-based diagonalization, a hybrid high-performance computing method critical for modeling molecular behavior. Robert Walkup (IBM Research, T.J. Watson Research Center), Juha Jäykkä (Advanced Micro Devices, Inc., AMD Silo AI), and Igor Pasichnyk (Advanced Micro Devices, Inc., Global Center of Excellence), among others, demonstrate how they have significantly accelerated this process by leveraging GPU-accelerated systems and OpenMP offload. Their approach achieves a 100× performance increase per node, reducing computationally intensive diagonalization from hours to minutes and enabling more efficient calculations of ground and excited state energies in complex quantum systems.
Sampled Quantum Diagonalization for Molecular Simulations
This work centres on sample-based quantum diagonalization (SQD), a novel hybrid method that encodes molecular Hamiltonian information into quantum circuits for evaluation, paving the way for more efficient electronic structure calculations. Diagonalization, traditionally the most computationally demanding aspect of this process, has been radically improved through the implementation of GPU-accelerated systems. Previous studies leveraged the Fugaku supercomputer and CPU-based diagonalization codes, but this research demonstrates a shift towards utilising GPUs as primary compute engines, enabling efficient, scalable, and portable diagonalization on heterogeneous systems. The team focused specifically on calculating ground-state energies and wavefunctions using the Davidson algorithm, employing a carefully designed offload strategy, code transformations, and data-movement techniques to maximise performance.
The research establishes a powerful iterative feedback loop between quantum sampling and classical post-processing, progressively refining the relevant computational subspace until convergence is achieved. By employing a selected configuration interaction (SCI) approach, the team represents multi-electron wavefunctions as linear combinations of anti-symmetrized products of spin-orbitals, effectively managing computational complexity. The method uses bit-strings to represent the occupation of spin-orbitals, enabling a targeted reduction of the exponentially growing configuration space inherent in full configuration interaction calculations. Furthermore, the technique extends to the description of ground-state properties of impurity models relevant to realistic materials calculations through embedding techniques. The work opens exciting possibilities for advancing quantum chemistry and materials science by enabling the efficient exploration of larger and more complex molecular systems than previously possible, ultimately accelerating scientific discovery.
GPU-accelerated sample-based diagonalization of molecular Hamiltonians enables efficient
This work details efforts to enable efficient, scalable, and portable diagonalization on heterogeneous systems, leveraging GPUs as primary compute engines for molecular Hamiltonian calculations. The study builds upon previous work utilising the Fugaku supercomputer and a highly scalable diagonalization code designed for CPUs, aiming to dramatically accelerate the computationally intensive diagonalization step. Researchers implemented sample-based diagonalization (SQD), a hybrid quantum-HPC method where information from a molecular Hamiltonian is encoded into a quantum circuit for evaluation. This process iteratively identifies configurations for carryover to the next step, with diagonalization representing the most demanding task for the classical component.
Experiments employed the Davidson algorithm with a selected set of electron configurations to compute ground-state energies and wavefunctions. The approach harnesses massive on-device thread-level parallelism inherent in GPUs, aligning well with the algorithms used for diagonalization. This method falls under the category of selected configuration interaction, utilising a finite set of spin-orbitals to represent the multi-electron wavefunction and achieve significant computational gains.
SQD Acceleration via Hybrid HPC and GPUs delivers
The team measured ground-state energies and wavefunctions using the Davidson algorithm with a selected set of electron configurations, focusing on optimising offload strategies, code transformations, and data movement. Within the quantum-sampled and classically-filtered subspace, a conventional diagonalization procedure obtains approximate eigenvalues and eigenvectors of the Hamiltonian, improving energy estimates and determining configurations for iterative refinement. The work falls under the category of selected configuration interaction (SCI), representing the multi-electron wavefunction as a linear combination of anti-symmetrized products of spin-orbitals, or configurations. A full configuration interaction calculation includes all possible configurations, but is impractical for many cases, so SCI aims to include a reduced set for computational tractability.
The quantum computer provides samples of electronic configurations, which are then filtered and classically diagonalized, necessitating a highly performant classical component, the focus of this study. Tests prove that the dominant computational cost arises from the iterative diagonalization of large Hamiltonian matrices, requiring the evaluation of millions to billions of Hamiltonian matrix elements per Davidson iteration. Previous implementations on the Fugaku supercomputer showed excellent CPU scalability, but the advent of exascale heterogeneous systems with GPU accelerators offered new opportunities. The researchers implemented a systematic approach to GPU acceleration using OpenMP 5.0+ target offload directives, prioritising portability and maintainability with a single code base for both CPU and GPU systems.
Further benchmarking on newer GPU-accelerated platforms, including H100, GB200, MI355X, and MI300X, showed speedups of 1.8x to 3x over Frontier. This data shows that the developed data-layout strategy, persistent configuration cache, and fully device-resident implementation of routines to evaluate Hamiltonian matrix elements significantly expand the range of tractable quantum chemistry applications. The team has open-sourced the full implementation to support reproducibility and further research.,.
GPU Acceleration Boosts Quantum Diagonalisation Speed significantly
Scientists have successfully adapted the Sample-based Quantum Diagonalization (SQD) method to run on Graphics Processing Units (GPUs), achieving significant speedups in computational chemistry. The researchers addressed key challenges in accelerating quantum chemistry codes for GPUs, including transforming complex data structures and implementing device-side evaluation of matrix elements. They developed a reusable roadmap for GPU acceleration, featuring a persistent GPU-resident configuration cache, systematic data flattening, and complete GPU implementation of Hamiltonian matrix computations. This portable solution maintains a single-source code base functional on both CPUs and GPUs, validated through comprehensive testing demonstrating numerical agreement with CPU implementations to within a relative error of 10−10. The authors acknowledge limitations related to the specific molecular systems tested and the need for further optimisation for even larger configurations. Future research could explore the application of these techniques to other iterative algorithms reliant on matrix-vector multiplication, expanding the impact of this work beyond SQD.
👉 More information
🗞 Scaling Sample-Based Quantum Diagonalization on GPU-Accelerated Systems using OpenMP Offload
🧠 ArXiv: https://arxiv.org/abs/2601.16169
