Beihang & Kyushu University Launch Qolumbina for Quantum Software Testing

Researchers from Beihang University and Kyushu University have launched Qolumbina, a new benchmark designed to rigorously test quantum software using a collection of 40 programs curated from open-source repositories. Unlike previous efforts that relied on examples, Qolumbina subjects these programs to a detailed process of systematic selection, refactoring, specifications, test case examples, and unit tests to ensure they are truly testable and comparable. Using QST-oriented criteria, the team characterizes quantum programs along functionality, output behavior, development complexity, and quantum-specific execution complexity. Through controlled experiments with two recent QST approaches, they demonstrate the feasibility of using Qolumbina for execution-cost analysis of 23 programs and fault-detection analysis of 26 programs with 7 buggy variants each, while also highlighting how the underlying quantum hardware can influence testing results. The team created 7 buggy variants for each of the 26 programs, enabling analysis of how effectively testing techniques identify errors. Their empirical study shows that Qolumbina supports scalability analysis beyond fixed-size circuit benchmarks.

Qolumbina Benchmark Infrastructure for Scalable Quantum Programs

The demand for reliable quantum software is accelerating, yet ensuring program quality presents unique challenges due to the counterintuitive nature of quantum mechanics and the increasing complexity of quantum systems. Addressing this need, researchers have introduced Qolumbina, a benchmark infrastructure designed to facilitate rigorous testing of scalable quantum programs. Unlike prior efforts that relied on examples, Qolumbina curates a substantial collection of 40 programs filtered from 8 open-source repositories, representing a significant increase in scale and realism toward more practical and reproducible evaluations. This isn’t simply a program repository; the team actively prepares these programs for testing through a process of systematic selection, refactoring, specifications, test case examples, unit tests, and standardized interfaces. Through controlled experiments with two recent QST approaches, the researchers demonstrate the feasibility of using Qolumbina for both execution-cost and fault-detection studies, while also highlighting the influence of backend dependencies on QST result interpretation.

The team created 7 buggy variants for each of 26 programs, enabling analysis of how effectively testing techniques identify errors. The empirical study reveals that Qolumbina exhibits diversity in testing-relevant properties and supports scalability analysis beyond the constraints of fixed-size circuit benchmarks.

The push for robust quantum software is now extending beyond simple demonstrations to encompass practical, real-world program testing. Earlier efforts relied on examples, but a shift is occurring toward more systematic evaluation using programs sourced from actual development environments. Researchers have responded by creating Qolumbina, a benchmark infrastructure built around a curated collection of 40 programs drawn from eight open-source repositories used to filter programs. This represents a significant increase in scale and realism compared to previous benchmarks like Bugs4Q and MQT Bench, which were either focused on classical components or designed for hardware testing rather than comprehensive software assessment. Qolumbina’s approach isn’t merely about gathering existing code; it involves a rigorous process of systematic selection, refactoring, specifications, test case examples, unit tests, and standardized interfaces. This attention to detail addresses a critical limitation of prior work, where fragmented sources and a lack of standardized interfaces hindered fair comparison and reproducibility. This diversity is intentional, designed to ensure that testing approaches are evaluated against a range of program types and scales.

Researchers are increasingly focused on characterizing quantum programs beyond simply verifying if they function. To this end, they propose QST-oriented criteria to characterize quantum programs along functionality, output behavior, development complexity, and quantum-specific execution complexity.

The demand for practical quantum computing requires more than just functional programs; understanding how difficult those programs are to create and run is now paramount. This detailed preparation allows for a nuanced analysis of program characteristics, extending beyond functionality to encompass aspects of build difficulty and runtime demands. Crucially, the researchers are employing standard software engineering metrics, such as lines of code, alongside quantum-specific measures like circuit width to characterize program scale. Through controlled experiments with two recent QST approaches, they demonstrate the feasibility of using Qolumbina for execution-cost and fault-detection studies, and highlight backend-dependent effects that can influence QST result interpretation.

While quantum computing advances rapidly, assessing the reliability of its software has lagged, often relying on benchmarks ill-suited to mirroring real-world development. Existing evaluations frequently employ “small hard-coded or circuit-level benchmarks,” limiting their capacity to truly test programs as they evolve in complexity and scale. This represents a significant shift from earlier efforts that often lacked systematic program selection and standardization. Qolumbina doesn’t simply gather code; it subjects each program to a rigorous process of systematic selection, refactoring, specifications, test case examples, unit tests, and standardized interfaces. This meticulous preparation allows for more meaningful analysis of testing methodologies and their effectiveness on programs approaching practical size. The infrastructure’s design supports scalability analysis beyond fixed-size circuit benchmarks, meaning the programs can be adjusted to varying scales based on classical inputs, more closely resembling how quantum software is actually developed.

A central challenge in validating quantum software lies in accurately assessing the computational demands placed on quantum hardware. The Qolumbina benchmark infrastructure, comprising 40 programs curated from 8 open-source repositories used to filter programs, facilitates this deeper analysis, representing a significant expansion from earlier work reliant on “small hard-coded or circuit-level benchmarks.” This shift towards more substantial, real-world programs allows for a more nuanced understanding of execution costs. Through controlled experiments with two recent QST approaches, they demonstrate the feasibility of using Qolumbina for execution-cost and fault-detection studies, and highlight backend-dependent effects that can influence QST result interpretation. These experiments analyzed 23 programs for execution-cost analysis and 26 programs with seven buggy variants each for fault-detection analysis. Results indicate Qolumbina can not only reproduce findings from prior studies but also reveal how simulated backends can influence the interpretation of QST results, highlighting the importance of considering hardware-specific effects.

The pursuit of reliable quantum software is increasingly focused on realistic program evaluation, moving beyond the limitations of previously employed benchmarks. Qolumbina distinguishes itself not merely through program quantity, but through a rigorous preparation process. This meticulous approach allows for more consistent test case design and interpretation of results. Controlled experiments utilizing Qolumbina and two recent QST approaches revealed valuable insights into fault detection. The team created 7 buggy variants for each of 26 programs, enabling analysis of how effectively testing techniques identify errors. These experiments also highlighted the impact of simulated backends on QST results, demonstrating that the infrastructure’s ability to reproduce findings from prior studies, while simultaneously providing new insights, underscores its potential as a foundational tool for advancing quantum software testing methodologies and ensuring the reliability of future quantum applications.

Researchers are increasingly focused on the practical realities of quantum software testing, moving beyond idealized scenarios to account for the nuances of actual quantum hardware. Yuechen Li of Beihang University and colleagues have developed Qolumbina, a benchmark infrastructure designed to address limitations in existing testing methodologies, and their work reveals how the choice of “backend”, the specific quantum processor or simulator used, can significantly influence test results. This attention to detail addresses a key shortcoming of earlier benchmarks, which often lacked the necessary structure for meaningful analysis. Using these criteria, their empirical study shows that Qolumbina covers diverse testing-relevant properties and supports scalability analysis beyond fixed-size circuit benchmarks.

Stay current. See today’s quantum computing news on Quantum Zeitgeist for the latest breakthroughs in qubits, hardware, algorithms, and industry deals.
Avatar of Rusty Flint

Rusty Flint

Rusty is a quantum science nerd. He's been into academic science all his life, but spent his formative years doing less academic things. Now he turns his attention to write about his passion, the quantum realm. He loves all things Quantum Physics especially. Rusty likes the more esoteric side of Quantum Computing and the Quantum world. Everything from Quantum Entanglement to Quantum Physics. Rusty thinks that we are in the 1950s quantum equivalent of the classical computing world. While other quantum journalists focus on IBM's latest chip or which startup just raised $50 million, Rusty's over here writing 3,000-word deep dives on whether quantum entanglement might explain why you sometimes think about someone right before they text you. (Spoiler: it doesn't, but the exploration is fascinating)

Latest Posts by Rusty Flint: