Quantum error correction remains a critical challenge in realising practical quantum computers, with decoding processes frequently imposing substantial computational limits. Dragana Grbic (Google Quantum AI and Rice University), Laleh Aghababaie Beni (Google Quantum AI), and Noah Shutty (Google Quantum AI) et al. detail significant advancements in optimising the Tesseract decoder, a high-performance, Most-Likely-Error decoder utilising the A* search algorithm. Their research presents a systematic, low-level optimisation of Tesseract, achieving substantial speedups across diverse quantum code families including Color Codes, Bivariate-Bicycle Codes, Surface Codes and Transversal CNOT Protocols. These improvements, consistently around 2x and exceeding 2.5x in many cases, with peaks over 5x for Bivariate-Bicycle Codes, not only enhance Tesseract’s efficiency and scalability but also underscore the vital role of high-performance software engineering in advancing the field of quantum error correction.
Researchers developed a decoder for Quantum Error Correction (QEC) that employs the A* search algorithm to explore an exponentially large graph of error hypotheses, achieving high decoding speed and accuracy. This paper presents a systematic approach to optimising the Tesseract decoder through low-level performance enhancements. Based on extensive profiling, they implemented four targeted optimisation strategies, including the replacement of inefficient data structures and the reorganization of memory.
Decoding performance evaluation using Tesseract and quantum code families reveals significant accuracy differences
Scientists are continually seeking layouts to improve cache hit rates, and the use of hardware-accelerated bit-wise operations. These improvements make the Tesseract decoder more efficient and scalable, serving as a practical case study that highlights the importance of high-performance software engineering in Quantum Error Correction (QEC) and providing a strong foundation for future research.
Quantum computing represents a significant paradigm shift in high-performance computing, as quantum computers can solve problems that even the most powerful classical supercomputers cannot. Their immense power stems from qubits, which, unlike classical bits restricted to states of 0 or 1, can exist in a superposition of both.
This enables them to store and process significantly more information, but introduces a critical vulnerability: qubits are fragile and susceptible to noise and unwanted environmental interactions. These factors lead to a rapid accumulation of errors, such as decoherence and gate errors. To overcome this, Quantum Error Correction (QEC) algorithms are executed on classical computers to constantly detect and fix errors on quantum machines.
This prevents error accumulation over time and enables the construction of robust and fault-tolerant quantum machines. The QEC process operates by collecting syndromes, which are measurements that indicate the presence and location of errors that occurred in the quantum system. The primary goal of a QEC decoder is to identify the most probable error configuration consistent with the input measurement syndrome.
This task can be computationally intensive, especially when decoding large and complex quantum codes, as decoders must explore a vast space of error configurations to identify the most probable one. Our work specifically improves the Tesseract decoder, a search-based decoder that employs the A algorithm to efficiently explore an exponentially large graph of error configurations and identify the most probable configuration consistent with the input measurement syndrome.
The original authors of Tesseract have shown that this decoder can decode various quantum code families; in this paper, we introduce novel optimizations to significantly boost its decoding speed while maintaining its high accuracy. We present a systematic optimization of the Tesseract decoder by identifying and addressing low-level computational bottlenecks through extensive profiling.
Our work achieves significant performance gains across diverse code families without compromising decoding accuracy. The primary contributions of our work are as follows: Significant and Scalable Decoding Speedups: We demonstrate consistent speedups of approximately 2× across most tested code families, frequently exceeding 2.5×.
Notably, we achieved a peak performance gain of over 5× for the most computationally demanding configurations of Bivariate-Bicycle codes, reducing the decoding time for 1,000 simulations from over 36,000 seconds to approximately 7,000 seconds. Architectural Optimizations for Data Locality: We identify and mitigate critical hardware-level bottlenecks, specifically addressing high cache miss rates by optimizing memory layouts for core heuristic functions.
Furthermore, we eliminate the overhead of bit-packed data structures by utilizing hardware-accelerated bit-wise operations, significantly improving instruction-level parallelism. Comprehensive Dataset and Step-by-Step Evaluation: We provide a rigorous performance dataset across a vast parameter space.
We employ a fine-grained, incremental evaluation strategy to quantify the individual impact of each optimization, providing an in-depth analysis of how specific code-level inefficiencies affect overall decoding throughput. QEC decoders operate by processing the input measurement syndrome and converting it into an internal representation suitable for decoding the most probable error configuration.
These internal representations often utilize graph-based structures that model errors in the system and the relations between them. Identifying the most probable error configuration enables the correction of the error. Minimum Weight Perfect Matching (MWPM) decoders [32, 34] operate by converting the input measurement syndrome into a graph model where nodes are violated syndrome bits (called defects), and edges are potential physical error chains connecting defects.
The goal of the decoder is to detect a minimum-weight perfect matching that connects nodes in the graph, which identifies the most probable error configuration consistent with the syndrome. MWPM decoders achieve high accuracy but have limited computational efficiency, often scaling polynomially with the code size.
Belief Propagation (BP) decoders [27, 29] operate on the code’s Tanner graph by passing probabilistic “beliefs” between variable nodes (qubits or errors) and check nodes (syndrome measurements). BP decoders achieve high computational efficiency, often scaling linearly with the code size, but have lower accuracy as they may not always find the optimal solution.
To address this, Belief Propagation with Ordered Statistics Decoding (BP+OSD) has emerged as a hybrid approach. This method performs an initial pass with BP and then employs OSD post-processing, which explores the most unreliable bit flips to find a more accurate, often optimal, solution. The BP+OSD approach imposes a trade-off between accuracy and efficiency, as it has lower computational efficiency compared to pure BP.
Union Find (UF) decoders [7, 16] operate by converting the input measurement syndrome into clusters that grow around detected syndrome defects and merge when they connect. This process can efficiently identify the most probable error configuration. UF decoders provide a strong balance of accuracy and efficiency.
Their computational efficiency often scales near-linearly with the code size, with logical error rates comparable to, or even surpassing, MWPM decoders. Correlated Matching (CM) decoders [6, 18] explicitly account for error dependencies that arise from multi-qubit gates or device-specific interactions in a real quantum system.
They operate by converting the input measurement syndrome into a more sophisticated model than the simple graphs used in prior work. They achieve this by introducing calibrated edge weights based on noise characterization or by employing hypergraph representations. CM decoders can achieve high accuracy, pushing closer to theoretical error thresholds.
Integer Programming (IP) decoders [20, 23, 24] operate by converting the input measurement syndrome into an Integer Linear Program (ILP) and decoding the error configuration by solving the ILP. These decoders provide highly accurate and often optimal solutions, serving as a powerful benchmark for other algorithms.
In this paper, we leveraged an IP decoder to rigorously verify that the performance optimizations we implemented inside Tesseract did not degrade its decoding accuracy when processing identical input measurement syndromes. Tesseract is a novel Most-Likely-Error (MLE) decoder for Low-Density-Parity-Check (LDPC) quantum code.
Tesseract employs the A algorithm to efficiently explore an exponentially large graph of error configurations and identify the most probable configuration consistent with the input measurement syndrome. Tesseract guides the A search towards the optimal solution by calculating the cost of a search state using the get_detcost function, which we analyze in detail in Section 4.2 and 4.3, and by prioritizing states with lower costs.
The original authors of Tesseract implemented several heuristic techniques to accelerate the decoder’s speed by sacrificing some accuracy: Admissible A Heuristic: Tesseract employs an admissible A heuristic function to calculate a strict lower bound on the cost of each search state. This property guides the A search towards the optimal solution, reaching the “EXIT” node, the state representing the error configuration consistent with the input measurement syndrome.
The admissibility of the heuristic guarantees finding the optimal configuration. Graph Pruning: Tesseract prunes the search space by employing a predicate that restricts which errors can be added to the current error configuration. This effectively makes the search graph a tree and eliminates the exploration of redundant paths.
Beam Search: Tesseract integrates beam search into the A search by employing a “beam cutoff” that prunes nodes with too many residual detection events, specifically those with a count of detection events significantly exceeding a minimum observed value. Beam search can significantly prune the search space and imposes a trade-off between accuracy and speed: smaller beam sizes yield higher efficiency but degrade accuracy more, as they risk finding a suboptimal solution if the optimal path is pruned.
No-revisit Detection Events and Residual Detection Penalty: Tesseract employs a heuristic that avoids re-visiting search states with previously explored patterns of fired detectors. This heuristic can significantly prune the search space and prevent the exploration of redundant paths. In addition, Tesseract can introduce a penalty cost for residual detection events to discourage high-cost paths.
At Most Two Errors Per Detector: Tesseract can employ an optional heuristic that assumes at most two errors can affect any given detector. When enabled, this heuristic prevents adding errors to the current configuration that would result in more than two errors affecting any fired detector.
Low-level optimisations deliver substantial speed increases to the Tesseract quantum decoder, particularly for large datasets
Decoding speedups of approximately 2x were consistently achieved across most tested quantum code families through low-level performance enhancements to the Tesseract decoder. For the majority of code families evaluated, decoding performance exceeded 2.5x the original speed. Architectural optimizations specifically targeted data locality, mitigating high cache miss rates through optimized memory layouts for core heuristic functions.
The elimination of overhead from bit-packed data structures, achieved by utilizing hardware-accelerated bit-wise operations, further improved instruction-level parallelism. A comprehensive dataset was generated, evaluating performance across a vast parameter space and quantifying the individual impact of each optimization implemented.
Validation of these results spanned a wide array of code families and noise models, alongside testing across three distinct hardware generations, Intel Xeon W-2135, Cascade Lake, and Sapphire Rapids. This robust evaluation confirms the consistent and scalable nature of the performance gains achieved, establishing a strong foundation for future research in quantum error correction.
Optimised C++ implementation accelerates Tesseract quantum decoder performance significantly
Scientists have substantially enhanced the performance of the Tesseract quantum error correction decoder through targeted software optimizations. The improvements were achieved by focusing on low-level performance enhancements within the decoder’s C++ implementation.
Specifically, the research team replaced inefficient data structures, reorganized memory layouts to maximize cache hit rates, and utilized hardware-accelerated bit-wise operations. These optimizations resulted in substantial reductions in cache misses, particularly in the L1 and LLC caches, as evidenced by the presented data for different code families and parameter settings.
This work highlights the critical role of high-performance software engineering in realizing the potential of quantum error correction. The authors acknowledge that the optimizations were specific to the Tesseract decoder and the hardware used for testing. While the principles of improving data locality and cache efficiency are broadly applicable, the exact gains may vary on different architectures or with different decoder implementations.
Future research directions include exploring further optimizations within the heuristic kernel and investigating the potential for hardware acceleration to further enhance decoding speeds. These advancements contribute to the development of more scalable and practical quantum error correction schemes, paving the way for fault-tolerant quantum computation.
👉 More information
🗞 Accelerating the Tesseract Decoder for Quantum Error Correction
🧠 ArXiv: https://arxiv.org/abs/2602.02985
