Thanks to the power of exascale computing, scientists are unlocking the secrets of microbial worlds. The ExaBiome project, a joint effort between Lawrence Berkeley and Los Alamos national laboratories and the Joint Genome Institute, is cataloging microscopic ecosystems found all around us using the Frontier supercomputer at DOE’s Oak Ridge National Laboratory.
Led by Kathy Yelick, a senior computational scientist at Berkeley Lab, and Leonid Oliker, director of the ExaBiome project, the team has developed optimized codes to reconstruct, classify, and compare collected genome sequences. With exascale computing, researchers can now analyze massive datasets in days or hours, rather than months or weeks. This capability has already led to the discovery of new microbial species not found in established databases.
The ExaBiome project is a key application of the Department of Energy’s Exascale Computing Project, which aims to develop advanced software for exascale-class supercomputers capable of performing a quintillion calculations per second.
Uncovering Hidden Microbial Worlds with Exascale Computing
The Department of Energy’s Exascale Computing Project (ECP) has been instrumental in developing advanced software for exascale-class supercomputers capable of performing a quintillion calculations per second. One of the key applications of this technology is the ExaBiome project, which aims to catalog the countless microscopic ecosystems, or microbiomes, found all around us.
A single drop of water or handful of dirt can contain its own universe of microbial organisms, many of which are too small to be detected by even the closest examination. Piecing together the traces of these microbes requires sifting through mountains of data, a task that has long been beyond the reach of even the fastest supercomputers. However, with the advent of exascale computing on the Frontier supercomputer at DOE’s Oak Ridge National Laboratory, this is now possible.
The Scientific Challenge: Assembling Genomes from Microbial Samples
The ExaBiome project is a joint effort of scientists at Lawrence Berkeley and Los Alamos national laboratories and the Joint Genome Institute. The team has spent years developing and optimizing codes such as MetaHipMer for assembling genomes from microbial samples, the Protein Alignment via Sparse Matrices code (PASTIS), and the High Performance Markov Clustering algorithm (HipMCL). These applications harness exascale’s speeds to reconstruct, classify, and compare collected genome sequences and understand the relationship and function of genes within microbial species.
Assembly is a complex process that can be likened to putting together a jigsaw puzzle with no box cover to guide us and with all the pieces dumped together from hundreds of different puzzles. The ExaBiome team picks out these pieces and puts them together into sequences, which may not be complete but provide valuable insights. These long sequences are then put into bins that go together and compared to what is already known.
Why Exascale Computing is Necessary
Even the average supercomputer cannot handle calculations of this size and complexity. Without the capability of running large, distributed computations, small species end up looking like errors because there aren’t enough single microbes to be recognized on their own. It’s only when these microbes are combined that there’s something to see, making exascale machines like Frontier essential for this type of research.
The ExaBiome codes have run calculations across all of Frontier’s more than 9,000 compute nodes, allowing researchers to shrink the cataloging and comparison work of months or weeks into days or hours. The overall peak performance on Frontier reflects a 536× improvement over the benchmark initially set by the team.
Frontier Success: Discovering New Microbial Species
Exascale computing has enabled the discovery of new microbial species that don’t exist in any established databases. Thanks to Frontier, researchers can analyze much larger datasets than have ever been possible — up to 100 terabytes and more. This has changed not just our understanding but how we conduct science in the environmental biology community.
What’s Next? Applying Exascale Analysis to Fundamental Questions of Biology
The ExaBiome team plans to apply exascale analysis to some of the fundamental questions of biology and genomics, from the human biome to microbe samples gathered from the ocean floor. Thanks to Frontier, researchers are gaining a much clearer, much more detailed picture of what’s living and happening in the microbial world.
The team aims to answer questions such as: What’s the functional behavior of these microbes? What genes do they possess? How do they interact? While this is just the beginning, it marks an important step towards understanding all the microbes in the world and exactly how they interact with one another.
External Link: Click Here For More
