Scalable Multi-Qpu Design Achieves Logarithmic Communication for Dicke State Preparation

Scientists are tackling the challenge of creating large-qubit Dicke states, essential for advanced quantum computing, but hampered by the limitations of single quantum processing units. Ziheng Chen (Institute of Computing Technology, Chinese Academy of Sciences and University of Chinese Academy of Sciences), Junhong Nie (Shandong University), and Xiaoming Sun et al. present a novel approach to scalable Dicke state preparation by distributing the computation across multiple QPUs. Their research details a distributed circuit capable of preparing a 16-qubit, 4-excitation Dicke state with logarithmic communication complexity, polynomial circuit size and depth , a first for simultaneously achieving these key metrics. This breakthrough significantly advances the feasibility of large-scale quantum computations by optimising communication overhead and local circuit costs, and the team further establishes a fundamental lower bound on communication complexity for multi-QPU state preparation.

This innovative approach assigns approximately ⌈n/p⌉ qubits to each QPU, resulting in a circuit with communication complexity of O(p log k), circuit size of O(nk), and circuit depth of O(p²k + log k log(n/k)). To the best of the team’s knowledge, this is the first construction to simultaneously achieve logarithmic communication complexity alongside polynomial circuit size and depth, representing a significant advancement in the field.

The study establishes a lower bound on the communication complexity for p-QPU distributed state preparation, formulating this limit in terms of the canonical polyadic rank (CP-rank) of a tensor associated with the target state. Specifically, for the case of p = 2 QPUs, the researchers explicitly computed the CP-rank corresponding to the Dicke state D(n, k) and derived a lower bound of ⌈log(k + 1)⌉. This result demonstrates that the communication complexity of their newly developed construction precisely matches this fundamental limit, validating the efficiency of their approach. The work opens exciting possibilities for scaling quantum computations beyond the limitations of single-QPU architectures.
Dicke states, equal superpositions of computational basis states with a fixed Hamming weight, are crucial in quantum networking, quantum game theory, quantum tomography, and quantum metrology. The team’s research addresses a key challenge in quantum information science: efficiently preparing these states with a large number of qubits. Existing methods for distributed Dicke state preparation often suffer from either exponential circuit size or polynomial communication complexity, hindering scalability. This new construction overcomes these limitations by strategically balancing communication overhead and local computation costs, offering a practical solution for large-scale quantum operations.

Experiments show that the proposed circuit significantly reduces the burden on inter-QPU communication, a critical bottleneck in distributed quantum systems. Inter-processor communication introduces qubit decoherence, transmission latency, and hardware synchronization challenges that are far more difficult to mitigate than local gate noise. By achieving logarithmic communication complexity, the researchers have minimized these challenges, paving the way for more robust and scalable quantum computations. The. Experiments revealed a communication complexity of, a circuit size of, and a circuit depth of, representing a significant advancement in scalable quantum computation.

This is, to the best of the researchers’ knowledge, the first construction to simultaneously achieve logarithmic communication complexity alongside polynomial circuit size and depth. The team measured the performance of their distributed circuit, demonstrating logarithmic scaling of communication complexity and polynomial scaling of both circuit size and depth. For a constant value of, the circuit depth and size closely match those of state-of-the-art non-distributed preparation methods. Detailed performance comparisons, as presented in Table 1, confirm the efficiency of this approach. Crucially, the work establishes a lower bound on the communication complexity for -QPU distributed state preparation, formulated in terms of the canonical polyadic rank (CP-rank) of a tensor associated with the target state.

Researchers explicitly computed the CP-rank for the special case of, corresponding to the Dicke state, and derived a lower bound of, demonstrating that the communication complexity of their construction aligns with this fundamental limit. Data shows that the intra-QPU computation cost, circuit size and depth, scales polynomically, while the inter-QPU communication complexity scales logarithmically. In the 2-QPU scenario, the established lower bound equals log k, confirming the tightness of the upper bound achieved by the proposed construction. The study allows for a modest number of ancillary qubits per QPU, acknowledging practical hardware constraints while maintaining the foundational goal of distributed quantum computing, addressing qubit capacity limitations. Tests prove that the proposed circuit achieves a communication complexity of O(p log k), where p represents the number of QPUs and k is the excitation number of the Dicke state. This breakthrough delivers a pathway towards scalable quantum computation by efficiently distributing the computational burden across multiple QPUs, paving the way for more complex quantum algorithms and simulations.

Dicke State Preparation via Distributed Quantum Circuits

Scientists have developed a new distributed quantum circuit for preparing large-qubit k-excitation Dicke states across multiple quantum processing units (QPUs). This research addresses a key challenge in scaling quantum computations by enabling the preparation of these states, important for computing, using a network of smaller QPUs. The presented circuit achieves logarithmic communication complexity alongside polynomial circuit size and depth, representing a significant advancement in the field. Researchers demonstrate a construction that simultaneously attains these desirable properties, a feat not previously accomplished.

Furthermore, they establish a lower bound on the communication complexity for preparing states across multiple QPUs, expressed in terms of the canonical polyadic rank (CP-rank) of an associated tensor. For the specific case of two QPUs, the CP-rank was explicitly calculated, confirming that the communication complexity of their circuit matches a fundamental limit, indicating optimality in this scenario. The authors acknowledge that proving the tightness of their communication complexity upper bound for more than two QPUs remains an open question. Future work will focus on determining the CP-rank of the relevant tensor for a general number of QPUs, which would rigorously verify their conjecture and provide broader optimality bounds. Additional research could also investigate the circuit’s resilience to noise in communication channels and its practical implementation on emerging quantum hardware.

👉 More information
🗞 Scalable Multi-QPU Circuit Design for Dicke State Preparation: Optimizing Communication Complexity and Local Circuit Costs
🧠 ArXiv: https://arxiv.org/abs/2601.20393

Rohail T.

Rohail T.

As a quantum scientist exploring the frontiers of physics and technology. My work focuses on uncovering how quantum mechanics, computing, and emerging technologies are transforming our understanding of reality. I share research-driven insights that make complex ideas in quantum science clear, engaging, and relevant to the modern world.

Latest Posts by Rohail T.:

Quantum Optics Advances Nonclassical States & Correlations for Information Technology

Quantum Optics Advances Nonclassical States & Correlations for Information Technology

January 30, 2026
Advances to Gilbert-Varshamov Bound Enable Improved Linear and Quantum Codes

Advances to Gilbert-Varshamov Bound Enable Improved Linear and Quantum Codes

January 30, 2026
Contextuality Achieves Irreducible Cost in Classical Representations of Information-Theoretic Systems

Contextuality Achieves Irreducible Cost in Classical Representations of Information-Theoretic Systems

January 30, 2026