Quantum Computing Enhances Speed, Efficiency in Computational Biology and Genome Assembly

Quantum Computing Enhances Speed, Efficiency In Computational Biology And Genome Assembly

Researchers from BGI Research Shenzhen, Shenzhen SpinQ Technology Co Ltd, BGI Research Wuhan, and Guangdong Bigdata Engineering Technology Research Center for Life Sciences have proposed a hybrid assembly quantum algorithm to address the challenges of genome assembly.

The algorithm uses high-accuracy short reads and error-prone long reads to deal with sequencing errors and repetitive sequences. It builds upon the variational quantum eigensolver and divide-and-conquer strategies to approximate the ground state of larger Hamiltonians while conserving quantum resources. The team used simulations of ten-qubit quantum computers to address problems as large as 140 qubits, yielding optimal assembly results.

What is the Potential of Quantum Computing in Computational Biology?

Computational biology is a field that can greatly benefit from the advantages of quantum computing due to its involvement in a wide range of challenging computational tasks. Researchers have recently begun to explore the applications of quantum computing in genome assembly implementation. Genome assembly is a process that involves the reconstruction of the original sequence of DNA from the data produced by automated sequencing machines. However, the issue of repetitive sequences in the genome remains unresolved.

In a recent paper, a team of researchers proposed a hybrid assembly quantum algorithm that uses high-accuracy short-reads and error-prone long reads to deal with sequencing errors and repetitive sequences. The proposed algorithm builds upon the variational quantum eigensolver and utilizes divide-and-conquer strategies to approximate the ground state of larger Hamiltonian while conserving quantum resources. Using simulations of ten-qubit quantum computers, the team was able to address problems as large as 140 qubits, yielding optimal assembly results.

The convergence speed of the algorithm was significantly improved via the problem-inspired Ansatz based on the known information about the assembly problem. In addition, the researchers qualitatively verified that entanglement within quantum circuits may accelerate the assembly path optimization.

How Has DNA Sequencing Technology Transformed Biology and Medicine?

DNA sequencing technology has dramatically transformed the fields of biology and medicine in the past few decades. This revolutionary tool allows researchers to decode the genetic blueprints of living organisms, leading to breakthroughs such as early cancer diagnosis and detection of inherited diseases. The throughput and speed of DNA sequencing have increased exponentially over the years, surpassing even Moore’s law that predicts the growth of computational power.

In 1987, the Sanger sequencing platform was only capable of sequencing approximately 1000 nucleotides per day. This limitation rendered the sequencing and assembly of the complete human genome by the Human Genome Project a 13-year endeavor. In contrast, present technologies allow for the sequencing of an entire human genome within mere hours. However, sequencing is just one facet of the challenge. Genome reconstruction is an indispensable subsequent step to pave the way for comprehensive studies.

With the rapid development of sequencing technology, there has been a corresponding surge in data volume that amplifies computational demands, prompting the evolution of diverse assembly algorithms. One of the early assembly algorithms is the overlap-layout-consensus (OLC) algorithm. The OLC algorithm transforms the genome assembly problem into a graph problem where each vertex represents a read and edges represent overlaps among all reads, aiming to find a Hamiltonian path in the graph.

What are the Challenges and Solutions in Genome Assembly?

The next-generation sequencing has been extensively used due to its accuracy and low cost. Nevertheless, the limited length of short reads makes it difficult to complete the assembly of highly repetitive complex regions in genomes. The advent of the third-generation sequencing platform, such as Oxford Nanopore Technology and Pacific Biosciences, has enabled the generation of long reads with a length of more than 10 kbp, which can span multiple genomic repeats and improve the contiguity of assembly. However, these long reads also have a high error rate and are not cost-friendly, limiting their applicability for large-scale genome projects.

To overcome these challenges, hybrid assembly approaches combining short and long reads have been developed. These methods have been shown to present unique benefits by complementing the strengths and weaknesses of each read type. For example, some approaches applied short reads to correct long reads and then assembled the corrected long reads. These approaches require high coverage and usually huge computing resources.

Another approach assembled the short reads first to generate precise contigs and employed long reads for scaffolding. The hybrid assembly provides an alternative cost-effective way since it requires fewer long reads than long-read-only methods. Some recent studies have shown that hybrid assembly is superior to long-read-only methods in terms of correctness, contiguity, and completeness. Nevertheless, hybrid assembly is still computationally demanding, requiring powerful computational resources and storage capacity, especially for large and complex genomes.

Can Supercomputers Handle the Explosive Growth of Sequencing Data?

Concerns have been raised regarding the ability of supercomputers to handle the explosive growth of sequencing data. The increasing volume of data generated by sequencing technologies amplifies computational demands, prompting the evolution of diverse assembly algorithms. However, the computational resources and storage capacity required for genome assembly, especially for large and complex genomes, are immense.

The advent of quantum computing offers a potential solution to this problem. Quantum computers can process a vast amount of information simultaneously, making them potentially more powerful than classical computers for certain tasks. The application of quantum computing in genome assembly could significantly improve the speed and efficiency of the process, making it possible to handle the increasing volume of sequencing data.

However, the practical application of quantum computing in genome assembly is still in its early stages, and there are many challenges to overcome. For example, the issue of repetitive sequences in the genome remains unresolved. Furthermore, the development of efficient quantum algorithms for genome assembly is a complex task that requires a deep understanding of both quantum computing and genomics. Despite these challenges, the potential benefits of quantum computing in computational biology are immense, and further research in this area is warranted.

Publication details: “Divide-and-Conquer Quantum Algorithm for Hybrid de novo Genome Assembly of Short and Long Reads”
Publication Date: 2024-04-23
Authors: Jing-Kai Fang, Yuefeng Lin, Jianli Huang, Yibo Chen, et al.
Source: PRX life
DOI: https://doi.org/10.1103/prxlife.2.023006