Hybrid Reinforcement Learning Optimises Variational Quantum Circuit Design and Performance

Automated design of quantum circuits presents a significant challenge as researchers strive to unlock the full potential of quantum computing, and a team led by Siddhant Dutta from Nanyang Technological University, and Nouhaila Innan and Muhammad Shafique from New York University Abu Dhabi, now presents a novel approach to this problem. Their research introduces a new framework for Quantum Architecture Search (QAS) that combines reinforcement learning with a curriculum learning strategy, allowing algorithms to progressively master increasingly complex circuit designs. By intelligently increasing circuit depth and complexity during training, and augmenting classical reinforcement learning methods with quantum enhancements, the team demonstrates substantially improved performance in optimising quantum circuits. This hybrid classical-quantum approach not only accelerates the design process, but also yields circuits with higher success probabilities and, in a classification task, achieves an accuracy exceeding 90%, paving the way for more efficient and automated quantum architecture discovery.

Quantum Architecture Search (QAS) represents an emerging field dedicated to automating the design of quantum circuits, with the ultimate goal of achieving optimal performance. This pursuit is driven by the increasing complexity of quantum circuits and the limitations of designing them manually. Consequently, there is a growing need for automated methods that can effectively navigate the vast design space and identify high-performing architectures.

This paper presents a novel approach to optimizing quantum circuit architectures using a combination of Deep Reinforcement Learning (DRL) and Tensor Networks (TN). The central idea is to use DRL to search for efficient quantum circuits, while leveraging TN to represent and manipulate the quantum states and circuits involved. A key innovation is the use of a curriculum learning strategy to progressively train the DRL agent on increasingly difficult quantum circuit optimization tasks.

Efficient quantum circuits are essential for realizing the potential of near-term quantum computers (NISQ devices). NISQ devices are limited by qubit count, coherence time, and gate fidelity, so reducing circuit depth and gate count is vital for mitigating errors and achieving meaningful computations. DRL offers a promising approach to QAS by learning a policy for selecting optimal circuit gates and connections. However, training DRL agents in the quantum domain is challenging due to the high dimensionality of the state space and the difficulty of obtaining reward signals. Tensor Networks provide a compact and efficient way to represent quantum states and circuits, which can help to reduce the computational cost of training DRL agents.

The research employs a DRL agent that learns to design quantum circuits through trial and error. The agent operates within a defined state space representing the current circuit architecture, including the number of qubits and the types of gates used. It then takes actions, such as adding or removing gates, to modify the circuit. A reward function guides the agent, encouraging it to find circuits that are both accurate and efficient by penalizing circuit depth, gate count, and error rate.

Tensor Networks, specifically Matrix Product States (MPS) and Projected Entangled Pair States (PEPS), are used to represent the quantum states and circuits involved in the optimization process. These networks provide a compact and efficient way to represent complex quantum systems, reducing the computational burden of simulating their behavior. TN contraction operations are used to simulate the evolution of the quantum circuit and to calculate the reward signal, further enhancing computational efficiency.

The DRL agent is trained on a sequence of increasingly difficult quantum circuit optimization tasks. The complexity of these tasks is gradually increased by increasing the number of qubits, the depth of the circuits, or the difficulty of the target quantum state. This curriculum learning approach helps to improve the convergence of the DRL agent and to avoid getting stuck in local optima, leading to more effective training.

The agent is designed to avoid generating invalid or illegal quantum circuits. This is achieved by incorporating constraints into the action space or by penalizing illegal actions in the reward function, ensuring that the agent only explores valid circuit designs.

The authors evaluated their method using standard quantum circuit benchmarks and generated their own datasets. The performance of the DRL agent was evaluated using metrics such as circuit depth, gate count, error rate, and accuracy. The results were compared to those obtained using existing QAS methods, such as random search, genetic algorithms, and gradient-based optimization. The authors demonstrate that their DRL-based QAS method, combined with TN and curriculum learning, can achieve significant improvements in circuit efficiency compared to existing methods.

Double Q-Learning is used to mitigate overestimation bias in Q-learning, leading to more stable training. Proximal Policy Optimization (PPO) is a policy gradient method that ensures stable policy updates. Tensor Network Contraction efficiently simulates quantum circuits and calculates reward signals. Curriculum Learning gradually increases task complexity to improve agent convergence. Illegal Action Pruning prevents the agent from generating invalid circuits.

Pennylane, a Python library for differentiable programming of hybrid quantum-classical computations, was used in the research. Pytorch, a deep learning framework, was used for implementing the DRL agent and training the model. Qiskit, a quantum computing SDK, was used for generating quantum circuits and evaluating their performance.

Future research directions include testing the optimized circuits on actual quantum hardware to account for noise and decoherence, scaling the method to larger quantum systems with more qubits and more complex circuits, further reducing the action space to enhance learning efficiency, and improving the ability of the agent to generalize to different types of quantum circuits and tasks. In conclusion, this research presents a promising approach to quantum circuit optimization using DRL, TN, and curriculum learning. The authors demonstrate that their method can achieve significant improvements in circuit efficiency compared to existing methods, and that it has the potential to enable the development of more powerful and practical quantum algorithms. The work bridges the gap between reinforcement learning and quantum computing, paving the way for future advances in this exciting field.

👉 More information
🗞 QAS-QTNs: Curriculum Reinforcement Learning-Driven Quantum Architecture Search for Quantum Tensor Networks
🧠 DOI: https://doi.org/10.48550/arXiv.2507.12013

Quantum News

Quantum News

As the Official Quantum Dog (or hound) by role is to dig out the latest nuggets of quantum goodness. There is so much happening right now in the field of technology, whether AI or the march of robots. But Quantum occupies a special space. Quite literally a special space. A Hilbert space infact, haha! Here I try to provide some of the news that might be considered breaking news in the Quantum Computing space.

Latest Posts by Quantum News:

IBM Remembers Lou Gerstner, CEO Who Reshaped Company in the 1990s

IBM Remembers Lou Gerstner, CEO Who Reshaped Company in the 1990s

December 29, 2025
Optical Tweezers Scale to 6,100 Qubits with 99.99% Imaging Survival

Optical Tweezers Scale to 6,100 Qubits with 99.99% Imaging Survival

December 28, 2025
Rosatom & Moscow State University Develop 72-Qubit Quantum Computer Prototype

Rosatom & Moscow State University Develop 72-Qubit Quantum Computer Prototype

December 27, 2025