Chemistry-enhanced Diffusion Generates Large Molecular Conformations from Small Molecules, Enabling Rapid Structure Prediction

Predicting the three-dimensional structures of complex molecules remains a significant challenge in chemistry, often demanding substantial computational resources, yet accurate molecular conformations are crucial for understanding chemical behaviour and designing new materials. Now, Yifei Zhu, Jiahui Zhang, Jiawei Peng, and colleagues at South China Normal University present a novel approach, StoL, a diffusion model that rapidly generates realistic structures for large molecules using information learned from smaller ones. This innovative framework operates by effectively building molecules from chemically valid fragments, similar to assembling a LEGO structure, and requires no prior knowledge of large molecule structures during training. The team demonstrates that this fragment-based strategy not only accelerates the process of conformation generation but also ensures chemically sound and diverse molecular arrangements, validated through comparison with established quantum mechanical calculations.

Generative Diffusion Models for Molecular Discovery

Computational chemistry and drug discovery are undergoing a transformation, driven by advanced machine learning techniques. Researchers are increasingly employing data-driven approaches, particularly generative models, to design and understand molecules more effectively. This field focuses on improving molecular modeling, predicting chemical properties, and accelerating the drug design process. A central theme is the use of generative models, such as diffusion models, to create novel molecules with desired characteristics. Key areas of investigation include physics-informed machine learning, which integrates fundamental physical principles into models to enhance accuracy and reliability.

Combining quantum chemical calculations with machine learning further refines predictions of molecular properties. Researchers are also focusing on generating accurate three-dimensional structures of molecules, known as conformers, and improving methods for molecular docking and dynamics simulations. Bioactivity databases, such as ChEMBL, serve as valuable resources for training and validating machine learning models, with applications ranging from identifying histone deacetylase inhibitors to optimizing lead compounds for drug development. The emphasis on generative models, physical constraints, and three-dimensional molecular structures signals a move towards more intelligent and efficient methods for designing and understanding molecules, promising significant advancements in the field.

StoL Generates Molecular Conformations From Fragments

Scientists have developed StoL, a novel framework for generating diverse and high-quality three-dimensional conformations of large molecules. This addresses a longstanding challenge in computational chemistry, where accurately predicting molecular structures is often computationally demanding. StoL employs a unique “small-to-large” strategy, assembling structures from smaller, chemically valid fragments, circumventing the need for extensive databases of large molecules. The framework accepts SMILES strings as input and directly outputs multiple conformational Cartesian coordinates, streamlining operation and enhancing accessibility for researchers.

The methodology mirrors construction with building blocks: molecules are fragmented into smaller components. A chemically enhanced diffusion model then generates plausible three-dimensional configurations for each fragment, guaranteeing structural diversity and chemical validity. This diffusion model is trained on small molecules, enabling the generation of diverse fragment conformations without requiring data from large molecules. The fragments are then assembled into a complete three-dimensional molecular structure, with chemistry-constrained filtering eliminating unphysical geometries. By explicitly incorporating chemical principles into key stages, StoL enhances predictive accuracy and generates chemically plausible and thermodynamically meaningful conformations.

StoL Generates Diverse Molecular Conformations Efficiently

Scientists have developed StoL, a novel machine learning framework that efficiently generates diverse, high-quality three-dimensional conformations for large polyatomic molecules. This addresses a significant challenge in computational chemistry, where predicting the structures of complex molecules requires substantial computational effort. StoL operates in a unique “small-to-large” manner, assembling molecules from smaller fragments, eliminating the need for extensive databases of large-molecule structures. The process begins with a SMILES string, which is systematically decomposed into chemically valid fragments using predefined rules.

A chemistry-enhanced diffusion model then generates multiple plausible 3D conformations for each fragment, undergoing rapid geometric filtering guided by chemical principles to eliminate unrealistic structures. This ensures the generated fragments are structurally diverse and chemically valid. The team demonstrates that this fragment-based strategy circumvents the limitations of traditional methods, which require large-molecule databases for training. During training, a two-step strategy incorporates the Sinkhorn and Gumbel-softmax algorithms, alongside planarity checks, guiding the model to learn chemically meaningful patterns.

This chemistry-enhanced approach improves training efficiency and overall performance. The final assembly stage integrates these fragments into complete molecular structures, applying chemistry-constrained validation to ensure structural integrity and thermodynamic stability. Consequently, StoL efficiently generates isomers across a broader conformational space, akin to the extensive exploration achieved through molecular dynamics, as validated by rigorous calculations.

Fragment Diffusion Generates Molecular Conformations

StoL represents a new approach to generating three-dimensional conformations of large molecules, achieving this efficiently from simple SMILES notation inputs without requiring extensive training data of large molecules themselves. The framework operates by decomposing molecules into chemically valid fragments, generating the 3D structures of these fragments using a diffusion model trained on smaller molecules, and then reassembling them into complete molecular conformations. This modular design allows for scalable and transferable conformation generation, effectively shifting the data requirement from large molecules to smaller, chemically meaningful fragments. Evaluations demonstrate that StoL significantly expands the coverage of conformational space compared to traditional rule-based methods, and, when combined with density functional theory refinement, identifies lower-energy minima than existing approaches. The team confirms that generated structures are not only geometrically plausible but also chemically valid, bridging a gap between deep generative modelling and chemical realism.

👉 More information
🗞 Chemistry-Enhanced Diffusion-Based Framework for Small-to-Large Molecular Conformation Generation
🧠 ArXiv: https://arxiv.org/abs/2511.12182

Rohail T.

Rohail T.

As a quantum scientist exploring the frontiers of physics and technology. My work focuses on uncovering how quantum mechanics, computing, and emerging technologies are transforming our understanding of reality. I share research-driven insights that make complex ideas in quantum science clear, engaging, and relevant to the modern world.

Latest Posts by Rohail T.:

Quantum Networks Promise Unhackable Communications and Super-Accurate Sensors

Quantum Networks Promise Unhackable Communications and Super-Accurate Sensors

February 7, 2026
New Software Accelerates Complex Calculations by up to 500times

New Software Accelerates Complex Calculations by up to 500times

February 7, 2026
Rapid Quantum Control Technique Boosts Signal Transfer across Wider Frequencies

Rapid Quantum Control Technique Boosts Signal Transfer across Wider Frequencies

February 6, 2026