A new runtime system effectively integrates quantum computers with existing high-performance computing infrastructure. Narasinga Rao Miniskar and colleagues at Oak Ridge National Laboratory present an intelligent, task-based runtime that unifies classical and quantum workload management. The system combines the Intelligent RuntIme System (IRIS) with a quantum programming stack, enabling programs written in the quantum intermediate representation (QIR) to run concurrently on both simulators and quantum processors. By partitioning quantum circuits, including a 4-qubit and a 20-qubit example, into smaller tasks, the work demonstrates the potential for parallel execution and reduced computational load while maintaining accuracy, representing a key step towards practical hybrid quantum-classical computing.
Dramatic runtime reductions unlock complex quantum circuit simulations
Execution times for a 20-qubit circuit were reduced to 82.74 seconds with a new runtime, representing a sharp improvement over the 22,061 seconds previously required on the same IONQ Forte-1 quantum processing unit. This speed-up crosses a key threshold for practical quantum computation, enabling iterative development and experimentation that was previously impractical due to lengthy simulation times. The Intelligent RuntIme System, or IRIS, manages concurrent classical and quantum tasks, dynamically allocating resources and scheduling workloads across CPUs, GPUs, and quantum processing units. The significance of this reduction lies in the ability to rapidly prototype and refine quantum algorithms, a process severely hampered by the traditionally long execution times associated with even modest quantum circuit simulations. Previously, debugging and optimising quantum code required substantial time investment simply waiting for results, hindering the pace of innovation. This new runtime aims to alleviate this bottleneck, fostering a more agile development cycle.
A 4-qubit circuit’s simulation time also fell to just 1.38 seconds using the same system and Quantum Intermediate Representation Execution Engine. The system achieved this by partitioning both circuits into three smaller sub-circuits, a process enabled by the QCut library’s quantum circuit cutting technique for independent simulation. Classical post-processing, executed on CPU cores through IRIS’s heterogeneous-memory model, then efficiently merged the results, reconstructing the original computation’s output. The heterogeneous-memory model is crucial, allowing IRIS to seamlessly transfer data between the different processing units, CPU, GPU, and QPU, without significant performance penalties. This is achieved through careful memory management and optimised data transfer protocols, ensuring that each component can efficiently access the data it needs. The partitioning process itself isn’t merely a division of labour; it leverages the inherent parallelism within quantum circuits, allowing multiple sub-circuits to be evaluated simultaneously, significantly reducing the overall execution time. The choice of partitioning into three sub-circuits was determined empirically to balance the overhead of data transfer and merging with the benefits of parallelisation for these specific circuits.
The framework supports diverse quantum back-ends, including simulators and IONQ ion-trap devices, demonstrating both portability and flexibility. Integration with Google’s QSIM simulator occurs via the eXtreme-scale ACCelerator programming framework, allowing programs to be dispatched to diverse back-ends and streamlining hybrid quantum-classical workflows. Efficient management of these workflows is vital as quantum computing matures, demanding a fundamental shift in how we manage computational resources and blend the strengths of both quantum and classical systems. The ability to target different back-ends is particularly important in the current era of quantum computing, where access to actual quantum hardware is limited. Researchers can develop and test algorithms using simulators before deploying them on physical devices, reducing the risk of errors and optimising performance. Furthermore, the framework’s flexibility allows it to adapt to the evolving landscape of quantum hardware, supporting new devices and technologies as they emerge. The use of QIR as an intermediate representation is key to this portability, providing a standardised format that can be translated into the native instruction sets of different quantum processors.
Hybrid runtime performance relies on efficient quantum circuit partitioning using the QCut library
The work acknowledges a key dependency on the QCut library for partitioning quantum circuits, a technique borrowed from wire cutting methods used in circuit optimisation. Performance will be intrinsically linked to its efficiency and adaptability across diverse quantum algorithms, presenting a potential limitation. However, this initial dependency is a pragmatic step towards demonstrating a functional hybrid runtime, with future iterations potentially exploring and integrating alternative partitioning techniques or even developing novel methods directly within the runtime itself. The QCut library identifies points within the quantum circuit where the entanglement between qubits allows for safe division without compromising the final result. This process requires careful analysis of the circuit’s structure and the dependencies between different quantum gates. The effectiveness of QCut is dependent on the specific characteristics of the circuit; circuits with high levels of entanglement may be more difficult to partition efficiently. Future work could investigate more sophisticated partitioning algorithms that consider the specific hardware architecture of the target quantum processor, optimising the partitioning strategy for maximum performance.
Concurrent execution of quantum and classical tasks on shared hardware is demonstrated by this integrated runtime system, moving beyond isolated quantum software environments. Complex calculations are divided into manageable sub-circuits by employing quantum circuit cutting, enabling parallel processing and reducing the computational burden on individual components. This approach enables hybrid workflows, dispatching programs written in the Quantum Intermediate Representation, a portable code format, to diverse back-ends such as simulators and quantum processors. The implications of this concurrent execution are significant. By offloading computationally intensive tasks to the most appropriate processing unit, be it CPU, GPU, or QPU, the runtime can maximise overall performance and efficiency. For example, classical pre- and post-processing steps can be handled by CPUs or GPUs, while the core quantum computations are executed on the QPU. This division of labour allows each component to focus on its strengths, resulting in a faster and more efficient execution pipeline. The use of QIR is crucial for enabling this seamless integration, providing a common language that allows different components to communicate and exchange data effectively. The runtime system also handles the complexities of data conversion and synchronisation, ensuring that the results from different components are correctly combined to produce the final output.
This research demonstrated a new runtime system capable of concurrently executing both quantum and classical computations on the same hardware. By partitioning a 20-qubit circuit into smaller sub-circuits using the QCut library, researchers achieved parallel processing and reduced the computational load on each processing unit. This approach utilises the Quantum Intermediate Representation to dispatch tasks to various back-ends, including quantum simulators and processors, enabling hybrid execution. The authors suggest future work will focus on optimising partitioning algorithms to further improve performance on specific quantum hardware.
👉 More information
🗞 Classic and Quantum Task-Based Intelligent Runtime for QIRs Running on Multiple QPUs
🧠 ArXiv: https://arxiv.org/abs/2605.11382
