A new multi-programmable scheduler optimises resource allocation on modular quantum processing units (QPUs), mirroring the role of GPUs and TPUs in classical high-performance computing. Vinooth Kulkarni and colleagues at Case Western Reserve University have designed this system to address the growing need for scalable quantum computing as hardware vendors connect multiple QPUs to achieve the scale, hundreds or even thousands of physical qubits, necessary for practical error correction. Integrating qubit mapping, parallel circuit execution, and teleportation strategies, the scheduler efficiently manages the unique demands of interconnected modular quantum computers and enables fair access for multiple users.
Distributed quantum computation via adaptive circuit partitioning and teleportation
Large quantum circuits exceeding the capacity of a single 127-qubit device are now executed through a new multi-programmable scheduler. Effectively partitioning and distributing workload across interconnected quantum processing units (QPUs) is now achievable, unlocking computations previously impossible due to hardware constraints. The scheduler employs circuit cutting, a technique that breaks down complex calculations into smaller subcircuits, alongside Bell-pair-based quantum teleportation for efficient data transfer between QPUs; this combination sharply reduces runtime overhead compared to local cutting methods. Circuit cutting involves identifying qubits that interact minimally and dividing the circuit at these points, while teleportation leverages pre-shared entanglement to transmit quantum states between distant qubits without physically moving them, crucial for maintaining coherence.
QuMod, a novel multi-programmable scheduler, achieved execution of quantum circuits beyond the capacity of a single 127-qubit processor. Adaptively managing circuit distribution and respecting sampling budgets offers improved performance and fidelity in modular quantum computing environments. This system utilises circuit cutting, specifically a mode termed LOCC (Local Operations and Classical Communication), which distributes calculations across multiple quantum processing units (QPUs) while carefully managing sampling budgets, contrasting with less efficient local cutting techniques that may lead to increased communication overhead and reduced fidelity. The LOCC approach ensures that only classical information is exchanged between QPUs after performing local operations, minimising the impact of decoherence during data transfer.
Evaluations using a simulator mirroring IBM’s modular QPU architecture demonstrated that LOCC-aware scheduling sharply reduces runtime overhead, maintaining or even improving the fidelity of results. QuMod explicitly accounts for classical communication between QPUs, synchronising subcircuits and grouping jobs by runtime to maximise parallel execution; this is vital for cloud environments serving multiple users. The scheduler prioritises jobs with similar runtime characteristics to minimise idle time on the QPUs and improve overall throughput. Despite these advances, current simulations rely on idealised interconnects and do not yet fully capture the complexities of real-world noise or the overhead associated with long-distance entanglement distribution. Factors such as cable losses, imperfect entanglement sources, and decoherence during teleportation are not fully modelled, representing areas for future research.
Scaling quantum computation through inter-processor workload distribution
The promise of modular quantum computers hinges on efficiently distributing calculations across multiple processors, much like modern classical systems utilise GPUs to accelerate tasks. However, this new scheduler currently lacks important performance benchmarks beyond linking two 127-qubit units. This raises a key tension: will the benefits of increased qubit counts be negated by the overhead of coordinating operations, particularly the complex data transfer via Bell-pair-based teleportation. The efficiency of teleportation is directly related to the fidelity of the entangled pairs and the speed of classical communication, both of which present significant engineering challenges.
Demonstrating a functional scheduler capable of distributing quantum workloads across separate devices is a key step forward, although this scheduler was tested on only two quantum processing units (QPUs). Scaling to larger, more complex systems presents significant hurdles. Maintaining coherence across multiple QPUs requires precise synchronisation and error mitigation strategies, which become increasingly difficult as the number of interconnected devices grows. This work establishes a foundational capability for future modular quantum computers, even if practical, large-scale implementation requires substantial further development of both hardware and software. A scheduler distributing quantum calculations across multiple processors has been demonstrated, mirroring advances seen with graphics processing units.
This initial success linked two 127-qubit units, establishing a key capability for future, larger systems and paving the way for more complex quantum networks. Coordinating operations like qubit mapping, assigning tasks to individual qubits, and utilising techniques such as circuit cutting and teleportation allows complex calculations to be distributed across interconnected devices. Qubit mapping is particularly challenging as it requires finding an optimal arrangement of logical qubits onto the physical qubits of the distributed QPUs, considering connectivity constraints and minimising the number of SWAP gates required. This approach mirrors the scaling strategies employed in classical high-performance computing, where accelerators like GPUs enhance processing power. The successful demonstration of parallel circuit execution and synchronisation across multiple QPUs opens questions regarding optimal resource allocation in shared, cloud-based quantum systems. Investigating scheduling algorithms that prioritise fairness, minimise latency, and maximise resource utilisation will be crucial for enabling widespread access to modular quantum computers.
The development of modular quantum computers represents a paradigm shift in quantum computing architecture. While single, monolithic QPUs face limitations in scalability, a modular approach offers a pathway to building systems with the thousands or even millions of qubits needed to solve complex problems. However, realising this potential requires overcoming significant challenges in interconnect technology, control systems, and software infrastructure. The QuMod scheduler represents a significant step towards addressing these challenges, providing a framework for efficiently managing and distributing quantum workloads across multiple QPUs. Future research will focus on extending the scheduler to support larger numbers of QPUs, incorporating more realistic noise models, and developing advanced error mitigation techniques to improve the fidelity of distributed quantum computations. The ultimate goal is to create a scalable, fault-tolerant quantum computer capable of tackling problems currently intractable for even the most powerful classical supercomputers.
The researchers successfully developed a multi-programmable scheduler, named QuMod, for modular quantum systems. This scheduler manages the complex task of distributing quantum calculations across multiple interconnected quantum processing units, or QPUs. By jointly considering qubit mapping, circuit execution, and data transfer, QuMod enables the processing of larger quantum circuits than could fit on a single device, such as the 127-qubit devices demonstrated by IBM. The authors intend to extend this work by supporting larger systems and incorporating more realistic error modelling.
👉 More information
🗞 QuMod: Parallel Quantum Job Scheduling on Modular QPUs using Circuit Cutting
🧠 ArXiv: https://arxiv.org/abs/2604.11013
