As the quest for powerful quantum computers accelerates, researchers are increasingly focused on architectures linking multiple quantum processors, a concept known as Distributed Quantum Computing (DQC). Recognizing that scaling quantum capabilities will likely involve networked systems, and that quantum processors will initially function as accelerators within existing high-performance computing (HPC) environments, a team from the Galicia Supercomputing Center and the University of Santiago de Compostela has developed CUNQA.
This open-source emulator allows scientists to proactively test and evaluate DQC strategies on current HPC infrastructure, paving the way for a future where quantum and classical computing work in concert—and potentially unlocking breakthroughs before fully realized quantum hardware is available. CUNQA uniquely simulates three key DQC models, offering a vital tool for exploring the programming considerations and performance characteristics of this emerging paradigm.
Distributed Quantum Computing and HPC Convergence
Researchers are developing CUNQA, an open-source emulator designed to simulate Distributed Quantum Computing (DQC) within High-Performance Computing (HPC) environments. This tool addresses the anticipated future of quantum computing, where multiple quantum processing units (QPUs) will be interconnected to boost computational power. CUNQA uniquely supports emulation of three DQC models – no-communication, classical-communication, and quantum-communication – allowing researchers to test and evaluate DQC architectures before actual hardware is widely available.
The convergence of DQC and HPC envisions quantum processors functioning as accelerators, much like GPUs, integrated into existing computational infrastructure. CUNQA facilitates this integration by allowing exploration of different QPU connection strategies – standalone, co-located, or on-node – depicted visually in accompanying research. By utilizing the well-known Quantum Phase Estimation (QPE) algorithm, the emulator provides a concrete method for analyzing the performance and scalability of these distributed quantum systems within a realistic HPC context.
Currently, CUNQA stands as the first tool capable of emulating all three DQC schemes within an HPC environment. This is crucial because it allows for software stack development and resource management strategies to be tested and refined. Researchers are exploring how to handle both classical and quantum resources effectively, with options ranging from middleware-managed allocation to user-defined resource control, paving the way for practical HPCQC systems.
Quantum Computing’s Evolving Landscape
Quantum computing is rapidly evolving, driven by both the pursuit of computational power and practical integration with existing high-performance computing (HPC) infrastructure. Researchers are actively exploring architectures featuring multiple connected quantum processing units (QPUs) – Distributed Quantum Computing (DQC) – mirroring the multi-core designs common in classical computing. This shift acknowledges the likely path to scaling quantum capabilities isn’t solely about bigger QPUs, but more interconnected ones, requiring new software and system designs.
This work introduces CUNQA, an open-source emulator designed to test and evaluate DQC within HPC environments before the hardware fully matures. CUNQA simulates three DQC models – no-communication, classical-communication, and quantum-communication – utilizing the well-known Quantum Phase Estimation (QPE) algorithm for analysis. This allows researchers to explore the challenges of distributed quantum systems and optimize performance in silico, a crucial step before costly hardware implementation.
Currently, integration strategies range from standalone quantum systems to co-located and “on-node” QPUs functioning like GPU accelerators. CUNQA supports this evolving landscape by providing a platform to model different integration approaches and explore software stack designs. Specifically, the emulator helps investigate where resource management (classical and quantum) should reside – within the middleware or as a user responsibility – ultimately streamlining the development of Hybrid HPC/Quantum applications.
Multicore Quantum Architectures
Researchers are actively exploring multicore quantum architectures, driven by the need to scale computational power. This approach, termed Distributed Quantum Computing (DQC), envisions multiple interconnected Quantum Processing Units (QPUs) working in concert. Simultaneously, there’s a strong trend toward integrating QPUs as accelerators within existing High-Performance Computing (HPC) environments – essentially treating them like powerful co-processors. This convergence of DQC and HPC integration is a key focus of current development.
The team behind CUNQA has developed an open-source emulator designed specifically to test and study DQC within HPC settings before the hardware is readily available. CUNQA implements three distinct DQC models: no-communication, classical-communication, and quantum-communication. Utilizing the well-established Quantum Phase Estimation (QPE) algorithm, the emulator allows researchers to analyze the performance and challenges of each communication scheme, paving the way for optimized system designs.
Notably, CUNQA is presented as the first tool capable of emulating all three DQC schemes within an HPC environment. This is significant because it moves beyond simply simulating a quantum algorithm and allows for exploration of the complex interplay between multiple QPUs and the classical infrastructure supporting them. The tool’s architecture supports investigating how resource management and compilation strategies impact DQC performance, a crucial step toward realizing practical, scalable quantum computation.
Standalone Quantum Computing Environments
Researchers are developing emulators like CUNQA to proactively address the complexities of Distributed Quantum Computing (DQC) within High-Performance Computing (HPC) environments. This tool anticipates a future where multiple Quantum Processing Units (QPUs) work in concert—a “multicore” approach mirroring classical computing. CUNQA uniquely emulates three DQC models – no-communication, classical-communication, and quantum-communication – enabling researchers to test and evaluate DQC architectures before physical hardware matures sufficiently for large-scale experimentation.
CUNQA’s design focuses on integrating QPUs as accelerators within existing HPC infrastructure. This contrasts with earlier visions of entirely standalone quantum computers (Figure 1a in the source) and moves toward co-located or even “on-node” integration (Figures 1b & 1c). The emulator utilizes the Quantum Phase Estimation (QPE) algorithm for demonstration, providing a benchmark for analyzing the performance of different DQC models and communication strategies within a realistic HPC context.
The development of CUNQA signifies a shift in HPCQC (High-Performance Quantum Computing) towards practical implementation. By offering a software stack that can explore resource management within or outside of middleware (Figure 2a & 2b), it allows researchers to investigate optimal strategies for combining classical and quantum resources. This proactive approach aims to resolve software and architectural challenges before they become bottlenecks in realizing the potential of hybrid quantum-classical computations.
QPUs as HPC Accelerators
Researchers are actively exploring integrating Quantum Processing Units (QPUs) as accelerators within High-Performance Computing (HPC) environments—analogous to how GPUs function today. This shift recognizes the limitations of standalone quantum computers and anticipates a future where quantum resources augment classical processing. Early work envisioned purely quantum HPC, but the focus has evolved to leveraging QPUs for specific computational tasks, requiring a re-evaluation of the software stack to manage both classical and quantum resources effectively.
Distributed Quantum Computing (DQC) is gaining traction, driven by both the need to scale quantum processing power and the architectural trends within classical HPC. Companies like IBM and Google are hinting at multicore quantum architectures. The CUNQA emulator is designed to test DQC models – no communication, classical communication, and quantum communication – within existing HPC infrastructure. This allows researchers to explore the challenges and benefits before hardware limitations become a roadblock.
CUNQA uniquely emulates these three DQC schemes within an HPC environment, using the Quantum Phase Estimation (QPE) algorithm for testing. Current research proposes different software stack approaches – managing resources within or outside the middleware layer – for HPCQC. The development of tools like CUNQA is crucial for understanding how to effectively integrate QPUs, paving the way for hybrid quantum-classical algorithms and applications that exploit the strengths of both computing paradigms.
Early Integration of QPUs in HPC
Researchers are actively exploring integrating Quantum Processing Units (QPUs) into High-Performance Computing (HPC) environments, envisioning them as accelerators similar to GPUs. This shift, beginning around 2016 with work by Svore & Troyer, anticipates cloud-based access to QPUs within existing HPC centers. Current integration models range from “standalone” quantum systems to “co-located” QPUs and, most recently, “on-node” integration – placing QPUs directly within classical HPC nodes. This progression demands a re-evaluation of existing software stacks for seamless operation.
The development of Distributed Quantum Computing (DQC) further complicates and enriches this integration. DQC proposes multi-core quantum architectures, mirroring classical computing’s approach to scaling. The authors present CUNQA, an open-source emulator designed to test these DQC concepts within HPC environments. CUNQA supports three DQC models – no-communication, classical-communication, and quantum-communication – allowing researchers to simulate and analyze their performance before physical multi-QPU systems become available.
Notably, CUNQA is presented as the first tool capable of emulating all three DQC schemes specifically within an HPC context. Using the Quantum Phase Estimation (QPE) algorithm as a benchmark, the emulator allows for analysis of how different communication schemes impact performance. This capability is vital as scaling quantum computers will likely depend on connecting multiple QPUs, necessitating robust simulation tools like CUNQA to optimize architectural designs and software stacks.
Software Stack Evolution for HPCQC
Recent advancements envision quantum processing units (QPUs) integrated into high-performance computing (HPC) environments, not as standalone systems, but as accelerators – similar to GPUs. This shift, beginning around 2016 with work by Svore and Troyer, necessitates a re-evaluation of the software stack. Current research explores three integration models: standalone, co-located, and “on-node” (where QPUs reside directly within classical nodes). Establishing a robust software framework is critical to effectively harness the potential of distributed quantum computing (DQC) within existing HPC infrastructure.
The evolution of HPCQC software stacks has moved from middleware-managed resource allocation (classical & quantum) to a user-managed approach. Early designs, like Rallis et al.’s, centralized resource handling within the middleware layer. Newer approaches, however, delegate resource management to the user, simplifying the middleware and offering greater flexibility. This shift reflects the growing complexity of hybrid quantum-classical applications and the need for more granular control over resource utilization—a key challenge for scaling DQC.
To address this evolving landscape, researchers developed CUNQA, an open-source DQC emulator designed for HPC. CUNQA uniquely supports emulation of three DQC communication models—no-communication, classical, and quantum—allowing for pre-implementation testing and analysis. By using the Quantum Phase Estimation (QPE) algorithm, CUNQA provides a platform to evaluate DQC architectures before physical hardware is widely available, accelerating the development of scalable HPCQC systems.
Resource Management within Middleware
Middleware plays a critical role in managing resources within emerging Hybrid Performance Computing (HPC) and Quantum Computing (QC) environments. Research indicates a shift towards integrating Quantum Processing Units (QPUs) as accelerators within existing HPC infrastructure—similar to GPUs. This necessitates middleware capable of orchestrating both classical and quantum resources. Specifically, middleware must handle allocation, scheduling, and communication between HPC systems and QPUs, as demonstrated by evolving software stack designs that centralize resource management at this layer.
Currently, two primary approaches to resource management are being explored. One model, favored by Rallis et al. [35], places all resource handling – classical and quantum – within the middleware. This simplifies application development by abstracting hardware complexities. Conversely, an alternative approach delegates resource management responsibility to the user, freeing the middleware from these tasks. The choice impacts software stack design and dictates the level of abstraction offered to quantum application developers.
The trend towards “on-node” integration—where QPUs reside directly within classical HPC nodes—further complicates resource management. This architecture demands middleware capable of fine-grained allocation of both CPU/GPU and qubit resources within the same physical machine. Effective middleware will need to optimize task scheduling to leverage the strengths of both classical and quantum processors, maximizing overall application performance and efficiency in these complex, heterogeneous environments.
User-Managed Resource Allocation
Researchers are developing CUNQA, an open-source emulator designed to test Distributed Quantum Computing (DQC) within High-Performance Computing (HPC) environments before fully realized hardware exists. This addresses a key challenge: scaling quantum computers necessitates architectures with multiple connected Quantum Processing Units (QPUs). CUNQA simulates three DQC models – no-communication, classical-communication, and quantum-communication – using the Quantum Phase Estimation (QPE) algorithm, providing a vital testing ground for future hybrid systems.
Currently, integrating quantum processors into HPC workflows presents software complexities. The team highlights two main approaches to resource management: one where the middleware handles both classical and quantum resources, and another—implemented in CUNQA—where users are directly responsible for allocating resources. This user-managed approach allows for greater flexibility and control, facilitating research into optimal resource allocation strategies for diverse quantum-classical workloads.
CUNQA is significant as, to the researchers’ knowledge, it’s the first tool to emulate all three DQC schemes within an HPC context. By enabling pre-hardware testing, it allows exploration of software stacks and resource management strategies crucial for building scalable, efficient hybrid quantum-classical systems. This proactive approach aims to accelerate the development and deployment of quantum computing as an accelerator within mainstream HPC.
QPU Integration Models: Standalone, Co-located, On-node
Researchers are exploring how quantum processing units (QPUs) will integrate with existing high-performance computing (HPC) infrastructure. Three primary integration models are emerging: standalone, co-located, and on-node. The standalone model envisions entirely separate quantum systems accessed via network, similar to distributed computing today. This approach, initially proposed in 2008, treats the QPU as a remote resource. Later work shifted focus to integrating QPUs within HPC centers, anticipating their role as accelerators – a crucial evolution for scalable quantum computation.
The co-located model positions QPUs within the same HPC facility, but as separate hardware accessed via network. This contrasts with the on-node integration, where the QPU resides directly within a standard HPC node – similar to a GPU. This tighter integration, explored from 2023 onward, promises reduced latency and increased bandwidth for data transfer between classical and quantum processors. This is vital for algorithms needing frequent data exchange, like certain quantum machine learning approaches.
These integration strategies impact the software stack needed to manage resources. Early models centralized resource management within middleware. More recent approaches are exploring a model where users handle quantum resource allocation directly, simplifying middleware and potentially improving flexibility. Understanding these differing architectures is crucial, as the optimal integration model will depend on specific algorithm requirements and available hardware configurations.
Visualizing QPU Integration Scenarios
Researchers are actively visualizing how quantum processing units (QPUs) will integrate with existing high-performance computing (HPC) infrastructure. Three primary integration scenarios are emerging: standalone (completely separate quantum systems), co-located (QPUs accessed remotely within an HPC center), and on-node (QPUs directly integrated into classical compute nodes, like GPUs). This work focuses on emulating all three distributed quantum computing (DQC) models—no communication, classical, and quantum—to proactively test these integration strategies.
The shift towards viewing QPUs as accelerators, rather than standalone systems, is driving the need for DQC emulation. Early visions (Devitt et al., 2008) focused on entirely quantum HPC, but current trends (Svore & Troyer, 2016) emphasize cloud access and accelerator roles. The CUNQA emulator facilitates testing software stacks designed for these hybrid architectures, allowing researchers to explore resource management and compilation strategies before physical DQC systems are widely available.
CUNQA uniquely emulates all three integration models using the Quantum Phase Estimation (QPE) algorithm. This allows for comparative analysis of different DQC approaches. Current software stack designs vary, with some incorporating resource management into middleware (Rallis et al., 2023) and others delegating it to the user. CUNQA’s ability to simulate these diverse scenarios is crucial for optimizing future HPCQC systems and maximizing quantum-classical co-processing.
CUNQA: An Open-Source DQC Emulator
CUNQA is a novel, open-source emulator designed to simulate Distributed Quantum Computing (DQC) within High-Performance Computing (HPC) environments. Recognizing the future trajectory of quantum computers as accelerators and interconnected multi-core systems, CUNQA bridges these two developing areas. It uniquely emulates three DQC models – no-communication, classical-communication, and quantum-communication – allowing researchers to test and evaluate DQC architectures before physical hardware matures. This proactive approach aims to accelerate the development of scalable quantum solutions.
The emulator’s architecture allows for exploration of how multiple Quantum Processing Units (QPUs) might function together. CUNQA implements these models using the well-known Quantum Phase Estimation (QPE) algorithm, providing a standardized benchmark for performance analysis. Importantly, it’s designed for integration with existing HPC infrastructure, acknowledging the growing trend of treating QPUs as co-processors. This focus distinguishes it from purely theoretical DQC simulations, offering a more practical evaluation platform.
To the best of the developers’ knowledge, CUNQA is the first tool capable of emulating all three DQC schemes within an HPC context. By providing a platform for pre-hardware testing, CUNQA aims to resolve software and architectural challenges before they become bottlenecks in real-world distributed quantum systems. This preemptive approach is crucial for realizing the full potential of scalable quantum computing and its integration into future HPC landscapes.
Emulating DQC Communication Models
Researchers at the Galicia Supercomputing Center (CESGA) developed CUNQA, an open-source emulator designed to test Distributed Quantum Computing (DQC) within High-Performance Computing (HPC) environments. Recognizing the future trajectory of quantum computers as both accelerators and interconnected multi-core systems, CUNQA bridges these concepts. The emulator supports three DQC models – no-communication, classical-communication, and quantum-communication – allowing researchers to analyze performance before actual DQC hardware becomes widely available. This proactive approach is crucial for optimizing software stacks and resource management.
CUNQA uniquely emulates these DQC models within a standard HPC setting, a capability currently absent in other tools. The Quantum Phase Estimation (QPE) algorithm serves as a benchmark for testing and analyzing the emulation of each communication model. This allows for quantitative comparisons of performance characteristics – like latency and throughput – under different DQC architectures. Understanding these metrics is vital for designing efficient hybrid quantum-classical algorithms and minimizing communication bottlenecks.
The development of CUNQA addresses a key gap in DQC research: the lack of a robust emulation platform within a realistic HPC context. Current approaches often focus on isolated quantum simulations or theoretical models. By integrating DQC emulation into HPC environments, CUNQA allows researchers to explore practical challenges related to resource management, software integration (like modified GCC compilers), and the overall architecture of future quantum-classical systems.
Quantum Phase Estimation Algorithm Implementation
Researchers at the Galicia Supercomputing Center (CESGA) developed CUNQA, a novel distributed quantum computing (DQC) emulator designed for high-performance computing (HPC) environments. This tool addresses the growing need to test and evaluate DQC architectures before physical implementation, supporting no-communication, classical-communication, and quantum-communication models. CUNQA uniquely allows exploration of how multiple quantum processing units (QPUs) could integrate with existing HPC infrastructure, simulating distributed quantum algorithms at scale.
To demonstrate CUNQA’s capabilities, the team implemented the Quantum Phase Estimation (QPE) algorithm – a cornerstone of many quantum computations – across these simulated DQC models. QPE’s success showcases CUNQA’s ability to emulate complex quantum circuits and analyze performance across different communication strategies. This is critical, as QPE’s accuracy relies on precise control of quantum states and operations, making it an ideal benchmark for validating the emulator’s functionality and identifying potential bottlenecks in distributed setups.
CUNQA stands out as the first tool capable of emulating all three DQC schemes within an HPC context. Previous work largely focused on standalone quantum systems or co-located QPUs; CUNQA’s comprehensive approach allows researchers to investigate the trade-offs between communication overhead, computational efficiency, and scalability in truly distributed quantum applications. This is crucial for moving beyond theoretical models and towards practical realization of fault-tolerant, large-scale quantum computing.
CUNQA’s Novelty in HPC DQC Emulation
CUNQA is a novel emulator designed to bridge the gap between distributed quantum computing (DQC) and high-performance computing (HPC). Unlike previous approaches focusing solely on integrating quantum processors into HPC, CUNQA proactively emulates entire DQC systems within an HPC environment. This allows researchers to test DQC architectures – specifically no-communication, classical-communication, and quantum-communication models – and algorithms like Quantum Phase Estimation (QPE) before actual multi-QPU hardware becomes widely available, accelerating development and reducing costly hardware dependencies.
This emulator’s core innovation lies in its ability to simulate the complexities of interconnected quantum processing units (QPUs) using standard HPC resources. CUNQA doesn’t just mimic quantum gates; it models the communication between QPUs, a critical aspect of DQC. By supporting all three primary DQC communication models, it offers a comprehensive platform for evaluating different architectural choices and their impact on performance. This is vital as companies like IBM and Google are actively pursuing multicore quantum architectures.
Currently, CUNQA stands as the first tool capable of emulating these three DQC schemes within an HPC context. Researchers can leverage existing HPC infrastructure to prototype and analyze DQC algorithms, identify bottlenecks, and refine system designs. This approach moves beyond theoretical exploration, enabling practical experimentation and validation of DQC concepts, ultimately accelerating the realization of scalable and powerful quantum computing solutions integrated with existing computational frameworks.
Source: https://arxiv.org/pdf/2511.05209
