Algorithm Swiftly Designs RNA Sequences for Structures

Scientists are tackling the computationally intensive RNA inverse folding problem, which seeks to design nucleotide sequences that reliably fold into desired structures. Shuta Kikuchi from the Graduate School of Science and Technology, Keio University, and Shu Tanaka, working with colleagues at the Keio University Sustainable Quantum Artificial Intelligence Center (KSQAIC), the Department of Applied Physics and Physico-Informatics, and the Human Biology-Microbiome-Quantum Research Center (WPI-Bio2Q) at Keio University, present a novel approach utilising Factorization Machine with Quadratic-optimization Annealing (FMQA). This research is significant because it addresses the limitations of existing methods requiring extensive sequence evaluations, particularly crucial when experimental validation presents a substantial cost. The team not only establishes a new FMQA framework for RNA inverse folding but also rigorously analyses how different binary-integer encoding schemes and nucleotide assignments impact the optimisation process, revealing that specific encoding and assignment strategies can substantially improve the stability and predictability of the resulting RNA structures.

Can we design RNA molecules to fold into specific, pre-determined shapes efficiently. A new technique combines machine learning with optimisation to rapidly identify the genetic code needed for a desired RNA structure. This approach bypasses slow, trial-and-error methods, offering a faster route to synthetic biology and new therapies. Scientists are increasingly focused on controlling the structure of ribonucleic acid (RNA) for applications ranging from vaccines to biosensors.

RNA’s function is intimately linked to its complex three-dimensional shape, which arises from its underlying secondary structure formed through base pairing of nucleotides. Yet, designing RNA sequences that reliably fold into a predetermined target structure, a problem known as RNA inverse folding, remains a substantial computational challenge. Traditional methods often demand evaluating a vast number of potential sequences, a limitation when experimental validation is expensive and time-consuming.

Researchers have developed a new computational framework employing a technique called factorization machine with quadratic-optimisation annealing (FMQA) to address this issue. This work centres on translating the chemical information of RNA nucleotides, adenine, uracil, guanine, and cytosine, into a format suitable for FMQA, a discrete black-box optimisation method.

The approach uses a surrogate model to predict the quality of RNA sequences with fewer evaluations. The way nucleotides are initially represented as integers and then converted into binary variables can profoundly impact the efficiency of the optimisation process. As a result, this study not only establishes a novel FMQA framework for RNA inverse folding but also systematically investigates the effects of different encoding and assignment strategies.

Once implemented, the team assessed all 24 possible combinations of assigning nucleotides to integers, alongside four distinct binary-integer encoding methods. Results indicate that certain encoding schemes, specifically one-hot and domain-wall encoding, consistently outperformed others in minimising a measure of structural difference from the target.

Further analysis revealed a preference for nucleotides assigned to the boundary integers within the domain-wall encoding. Specifically, assigning guanine and cytosine to these boundary positions encouraged their placement within the stem regions of the RNA structure, yielding more stable configurations than those achieved with alternative encoding methods.

Understanding how these choices influence the search process is vital for designing effective RNA molecules. By carefully considering both the initial nucleotide assignments and the binary encoding, scientists can refine the FMQA framework and potentially reduce the number of computationally expensive evaluations needed to arrive at a desired RNA structure. Beyond streamlining the design process, this work opens avenues for creating synthetic RNAs with tailored properties for a wide range of biotechnological applications.

Optimised RNA Secondary Structure Prediction via Encoding and Integer Assignment

Employing a factorization with quadratic-optimisation annealing (FMQA) framework, research revealed that one-hot and domain-wall encodings consistently yielded lower normalized ensemble defect values than binary or unary encodings. The normalized ensemble defect, a measure of how closely a predicted RNA secondary structure matches the target, was minimised using these superior encoding methods.

Further analysis demonstrated that assigning nucleotides to boundary integers (0 and 3) within domain-wall encoding increased their frequency of appearance. At the core of these findings, domain-wall encoding, when coupled with specific nucleotide assignments, promoted the enrichment of guanine and cytosine in stem regions of the predicted RNA structure.

This enrichment resulted in more thermodynamically stable secondary structures compared to those generated using one-hot encoding. Considering the influence of integer assignment, nucleotides assigned to boundary integers (0 and 3) appeared with higher frequency in domain-wall encoding. Since guanine and cytosine were assigned to these boundary integers, their increased presence in stem regions contributed to the observed stability.

For instance, a configuration assigning guanine to ‘0’ and cytosine to ‘3’ consistently produced structures with lower defect values. Inside the tested configurations, the lowest normalized ensemble defect values were achieved with domain-wall encoding, indicating a better fit between the predicted and target RNA structures. By contrast, binary and unary encodings consistently produced higher defect values across all nucleotide assignments.

The difference between one-hot and domain-wall encoding was subtle, with domain-wall encoding demonstrating a slight advantage in promoting stem stability. Under the conditions tested, the research established a novel FMQA framework for RNA inverse folding and clarified the effects of encoding choices on solution quality.

Nucleotide encoding strategies optimise RNA inverse folding with factorization machines

A factorization machine with quadratic-optimisation annealing (FMQA) underpinned the methodology employed to address the RNA inverse folding problem. This discrete black-box optimisation technique was selected for its reported ability to yield high-quality solutions with a reduced number of evaluations, a benefit when experimental verification presents cost limitations.

Initial steps involved converting each nucleotide, adenine, uracil, guanine, and cytosine, into binary variables to render the problem compatible with the FMQA framework. A central aspect of this work focused on systematically examining the impact of nucleotide-to-integer assignments and binary-integer encoding schemes on FMQA performance. Researchers hypothesized that these choices shape the surrogate model and the resulting search field, directly influencing the quality of the final RNA sequence.

As a result, the study exhaustively evaluated all 24 possible permutations of assigning the four nucleotides to the integers 0 and 3. These assignments were then paired with four distinct binary-integer encoding methods: binary, unary, one-hot, and domain-wall encoding. The one-hot encoding represents each nucleotide with a vector where only the corresponding nucleotide position has a value of one, while the others are zero.

In contrast, unary encoding uses a string of ones to represent each nucleotide, with the length of the string corresponding to the nucleotide’s integer assignment. Domain-wall encoding, a less common approach, represents nucleotides based on transitions between binary states. By comparing the performance of these combinations, the research aimed to identify configurations that optimise the efficiency and accuracy of the FMQA process.

To assess solution quality, the normalized ensemble defect value served as a primary metric. Analysis extended to observing the frequency with which specific nucleotides appeared when using domain-wall encoding, particularly focusing on those assigned to boundary integers (0 and 3). Inside this encoding, a higher prevalence of guanine and cytosine at these boundary positions correlated with enrichment in stem regions of the predicted RNA secondary structure, potentially leading to greater thermodynamic stability compared to sequences generated using one-hot encoding.

Numerical encoding profoundly influences RNA structure prediction optimisation

Scientists attempting to design RNA molecules with specific shapes have long faced a computational bottleneck. Predicting how a sequence will fold is relatively easy, but working backwards, finding the sequence that will fold a certain way, demands exploring a vast number of possibilities. This research presents a clever application of optimisation techniques to tackle this ‘inverse folding’ problem, and what distinguishes it is a detailed examination of how seemingly minor choices within the computational method can dramatically affect the outcome.

The real advance here isn’t simply achieving better designs, but understanding why certain computational approaches work better than others. For years, researchers have treated the encoding of genetic information into numbers as a necessary, but largely unexamined, step in these calculations. Findings reveal that the way nucleotides are represented numerically, and then converted into binary code, shapes the entire optimisation process.

Specifically, one-hot and domain-wall encodings proved superior, and assigning guanine and cytosine to the edges of the numerical range encouraged the formation of stable stem structures on of stable stem structures. While performance improved across benchmark tests, the method still requires substantial computing power, and the gains observed may not translate equally well to all RNA structures.

The study focused on relatively short RNA sequences; scaling up to longer, more complex molecules presents a new set of challenges. Once these hurdles are addressed, the implications extend beyond basic research. At a practical level, improved RNA design tools could accelerate the development of new therapeutics, diagnostics, and even biomaterials. Instead of relying on chance or laborious trial-and-error, scientists could precisely engineer RNA molecules to perform specific functions. Further work might explore combining this optimisation framework with machine learning approaches, allowing the system to learn from previous designs and further refine its predictive capabilities.

👉 More information
🗞 Factorization Machine with Quadratic-Optimization Annealing for RNA Inverse Folding and Evaluation of Binary-Integer Encoding and Nucleotide Assignment
🧠 ArXiv: https://arxiv.org/abs/2602.16643

Rohail T.

Rohail T.

As a quantum scientist exploring the frontiers of physics and technology. My work focuses on uncovering how quantum mechanics, computing, and emerging technologies are transforming our understanding of reality. I share research-driven insights that make complex ideas in quantum science clear, engaging, and relevant to the modern world.

Latest Posts by Rohail T.:

Entangled States Predict Outcomes Beyond Classical Limits

Entangled States Predict Outcomes Beyond Classical Limits

February 20, 2026
Light Squeezed at Band-Gap Frequency in New States

Light Squeezed at Band-Gap Frequency in New States

February 20, 2026
Pressure Boosts Magnetism in Layered Semiconductor Crystals

Pressure Boosts Magnetism in Layered Semiconductor Crystals

February 20, 2026