Multi-agent Reinforcement Learning Accelerates Quantum Architecture Search for Complex Circuits

Designing effective quantum circuits remains a significant challenge, but Mikhail Sergeev, Georgii Paradezhenko, and Daniil Rabinovich, alongside Vladimir V. Palyulin and their colleagues at the Skolkovo Institute of Science and Technology and The Russian Quantum Center, now present a new approach to automated circuit design. Their research tackles the problem of scalability in quantum architecture search, a process that identifies optimal circuit structures for specific computational tasks. The team demonstrates a multi-agent reinforcement learning system, where multiple independent ‘agents’ collaborate to build circuits in a distributed manner, dramatically accelerating the search process and reducing computational demands. This innovative method not only improves efficiency on benchmark problems like MaxCut and ground energy estimation, but also lends itself naturally to implementation on current and near-future quantum hardware.

Distributed Quantum Architecture Search via Reinforcement Learning

Quantum architecture search (QAS) automates the design of quantum circuits, offering a pathway to overcome limitations in human-designed quantum algorithms. This work investigates a distributed approach to QAS, leveraging multi-agent reinforcement learning to accelerate the search process and improve the quality of discovered circuits. The proposed method employs multiple agents, each exploring a specific region of the circuit design space, and facilitates collaboration through a shared experience replay buffer. Agents learn to construct circuits that maximise performance on target problems, such as quantum state preparation and classification, using a policy gradient algorithm. Experiments demonstrate that this distributed multi-agent approach achieves a significant speedup compared to single-agent methods, while discovering circuits with comparable or superior performance. The results highlight the potential of distributed reinforcement learning for tackling complex quantum circuit design challenges and accelerating the development of quantum algorithms.

Reinforcement learning (RL) is a promising implementation of quantum approximate optimisation, but current approaches are single-agent-based and struggle to scale with increasing numbers of qubits due to the expanding action space and associated computational cost. The team proposes a novel multi-agent RL algorithm for QAS, where each agent operates independently on a specific block of the quantum circuit, significantly accelerating convergence and reducing computational cost, as benchmarked on the MaxCut problem.

QAOA Performance and Parameter Optimisation Studies

Research at the intersection of Quantum Computing (QC) and Reinforcement Learning (RL) is vibrant and rapidly evolving. A dominant theme is the Quantum Approximate Optimisation Algorithm (QAOA), with research focusing on performance on Max-Cut, investigating its scalability and limitations, and exploring the number of qubits required for quantum speedup. Parameter optimisation is a major challenge, with RL increasingly used to learn these parameters, bypassing traditional optimisation techniques. Studies also explore the transferability of parameters, crucial for practical applications.

Variational quantum simulation, using algorithms like QAOA to simulate physical systems, is also a key area of investigation, with self-verifying approaches being explored to improve accuracy and reliability. Research also focuses on distributed quantum computing, connecting multiple quantum processors to increase computational power, and quantum state diagonalisation, finding the eigenvalues and eigenvectors of quantum states using variational methods and RL.

RL is a powerful tool for optimising and controlling quantum systems, specifically finding the best parameters for QAOA, often outperforming traditional optimisation methods. This is a central application, with RL also used for quantum circuit design and optimisation, and for controlling the evolution of quantum systems to achieve desired states or perform specific operations. Applications include quantum compiler optimisation, improving the efficiency of mapping algorithms to quantum hardware, and quantum architecture search, potentially leading to more powerful and efficient designs.

The research leverages a wide range of RL techniques, including deep reinforcement learning, using deep neural networks to represent value functions and policies, and Proximal Policy Optimisation (PPO), a popular and effective policy gradient algorithm. Other techniques include Double Q-Learning, mitigating overestimation bias in Q-learning, Action Branching, improving exploration by allowing agents to try multiple actions simultaneously, and Multi-Agent Reinforcement Learning (MARL), using multiple RL agents to solve complex problems.

Emerging trends include combining QC and MARL, exploring the potential of combining multi-agent reinforcement learning with quantum computing, and addressing the challenges of scaling RL algorithms to handle the complexity of large-scale quantum systems. Developing methods to transfer knowledge learned from one quantum system to another, improving the robustness and reliability of quantum systems and RL algorithms, and leveraging distributed computing to train RL agents for quantum systems are also key areas of investigation. Compiler design for distributed QC, developing compilers that can effectively distribute quantum computations across multiple processors, is also being explored.

Specific problem domains include the Max-Cut problem, a classic graph partitioning problem used as a benchmark for quantum algorithms, Rayleigh-Bénard convection, where RL is used to control the system, and manufacturing scheduling, applying RL to optimise scheduling in manufacturing environments.

RL is rapidly becoming an essential tool for tackling the challenges of designing, optimising, and controlling quantum systems. The focus is shifting from theoretical exploration to practical applications, with a growing emphasis on scalability, robustness, and transfer learning. The combination of these two powerful fields holds immense promise for unlocking the full potential of quantum computing.

Multi-Agent Learning Designs Compact Quantum Circuits

This research presents a novel multi-agent reinforcement learning approach for quantum architecture search, termed MARL-QAS, designed to automate the creation of parameterised quantum circuits. The team successfully trained multiple agents to cooperatively design circuits tailored to specific problems, utilising QMIX to facilitate this cooperation. Benchmarking on both combinatorial optimisation problems, specifically MaxCut, and ground state search for the Schwinger Hamiltonian, demonstrates the effectiveness of this new method.

The resulting circuits, designed using MARL-QAS, exhibit a substantially lower number of entangling gates when compared to conventional quantum approximate optimisation and hardware efficient ansatzes achieving the same level of approximation. Furthermore, circuits designed for the Schwinger Hamiltonian contain fewer optimisable parameters than standard hardware efficient ansatzes, potentially simplifying the optimisation process.

The team demonstrated a significant reduction in the number of training steps required to design a satisfactory circuit.

👉 More information
🗞 Distributed quantum architecture search using multi-agent reinforcement learning
🧠 ArXiv: https://arxiv.org/abs/2511.22708

Rohail T.

Rohail T.

As a quantum scientist exploring the frontiers of physics and technology. My work focuses on uncovering how quantum mechanics, computing, and emerging technologies are transforming our understanding of reality. I share research-driven insights that make complex ideas in quantum science clear, engaging, and relevant to the modern world.

Latest Posts by Rohail T.:

Topology-aware Machine Learning Enables Better Graph Classification with 0.4 Gain

Llms Enable Strategic Computation Allocation with ROI-Reasoning for Tasks under Strict Global Constraints

January 10, 2026
Lightweight Test-Time Adaptation Advances Long-Term EMG Gesture Control in Wearable Devices

Lightweight Test-Time Adaptation Advances Long-Term EMG Gesture Control in Wearable Devices

January 10, 2026
Deep Learning Control AcDeep Learning Control Achieves Safe, Reliable Robotization for Heavy-Duty Machineryhieves Safe, Reliable Robotization for Heavy-Duty Machinery

Generalist Robots Validated with Situation Calculus and STL Falsification for Diverse Operations

January 10, 2026