Solvaformer: Graph Transformer Predicts Small Molecule Solubility with Geometry-Aware SE(3) Equivariance

Predicting how well a small molecule dissolves is a fundamental challenge in chemistry, crucial for efficient drug discovery and materials design, yet traditional methods are often expensive and lack clear explanations. Jonathan Broadbent, Michael Bailey, Mingxuan Li, and colleagues now present Solvaformer, a new approach that accurately forecasts solubility by modelling solutions as collections of molecules, respecting their three-dimensional geometry. This geometry-aware system learns from both computer simulations and experimental data, achieving state-of-the-art performance while also revealing the key molecular interactions that drive solubility, such as hydrogen bonding patterns. Solvaformer represents a significant advance by combining geometric understanding with a diverse training strategy, offering a scalable and interpretable tool for predicting solution behaviour.

Solubility Prediction Using Machine Learning Models

Researchers are tackling the challenging problem of accurately predicting how well organic compounds dissolve in various solvents, a critical need in chemistry, drug discovery, and materials science. This work focuses on leveraging the power of machine learning to overcome limitations in traditional methods. The team utilized the extensive BigSolDB 2. 0 dataset, a comprehensive collection of solubility measurements, and supplemented it with data from CombiSolv-QM, a dataset generated through quantum mechanical calculations. Careful cleaning of the BigSolDB 2.

0 dataset removed duplicate measurements, ensuring data quality by eliminating 6591 redundant entries. The research demonstrates the potential of machine learning, particularly transformers and equivariant neural networks, to accurately predict solubility. Combining experimental data with quantum chemical calculations significantly improves model performance. Rigorous data cleaning and quality control are crucial for building reliable predictive models, and careful hyperparameter optimization is essential for maximizing accuracy. Data is publicly available through established repositories, ensuring reproducibility and wider access for the scientific community.

Geometry-Aware Graph Transformer for Solubility Prediction

Scientists have developed Solvaformer, a novel computational model that accurately predicts the solubility of small molecules and their solvation free energy. This innovative model addresses limitations in existing methods by treating solutions as collections of molecules and incorporating three-dimensional geometry. Solvaformer utilizes SE(3) symmetries, encompassing both rotation and translation, to model molecular interactions with enhanced precision. The architecture combines intramolecular attention, focusing on relationships within individual molecules, with intermolecular scalar attention, enabling effective cross-molecular interaction modeling.

To train Solvaformer, the team employed a multi-task learning approach, simultaneously predicting both solubility and solvation free energy using an alternating-batch regimen. This combined training strategy unites complementary computational and experimental data, enhancing the model’s generalization capability and predictive power. Rigorous evaluation demonstrates that Solvaformer achieves state-of-the-art overall performance, approaching the accuracy of DFT-assisted methods while surpassing existing alternatives. Furthermore, the study pioneered the use of token-level attention mechanisms to produce chemically coherent attributions, revealing key intra- and inter-molecular hydrogen-bonding patterns that govern solubility differences in positional isomers. This interpretability is a significant advancement, providing insights into the underlying chemical principles driving solubility and enabling a deeper understanding of solution-phase properties.

Solvaformer Accurately Predicts Molecular Solubility and Interactions

Scientists have developed Solvaformer, a new computational model for predicting how well small molecules dissolve, achieving significant accuracy using both experimental data and quantum mechanical calculations. This work addresses a critical need for efficient solubility prediction, which is essential for accelerating chemical synthesis and process optimization. Solvaformer models solutions as collections of molecules, accounting for their independent three-dimensional orientations, and utilizes a unique architecture combining intramolecular and intermolecular attention mechanisms. The team trained Solvaformer using a multi-task approach, leveraging 82,758 solute-solvent measurements from the BigSolDB 2.

0 experimental dataset and approximately one million solvent-solute pairs generated by quantum mechanical calculations. This combined training strategy enabled the model to achieve state-of-the-art performance in predicting solubility, approaching the accuracy of a gradient-boosting baseline assisted by density functional theory. The model’s performance was rigorously tested on a diverse set of 9,250 measurements, ensuring robust generalization to new chemical structures. Furthermore, Solvaformer provides interpretable results through its attention weights, revealing the specific molecular interactions driving solubility predictions.

Case studies demonstrate the model’s ability to recover known hydrogen-bonding patterns between molecules, providing insights into the underlying chemical principles governing solubility. The BigSolDB 2. 0 dataset, comprising 103,944 experimentally measured solubility values for 1,448 solutes in 213 solvents, was carefully curated and filtered, resulting in a dataset with a high level of precision in solubility measurements.

Geometry-Aware Prediction of Molecular Solubility and Free Energy

Solvaformer represents a significant advance in predicting the solubility of small molecules, a crucial capability for accelerating chemical synthesis and process optimisation. Researchers developed a geometry-aware graph transformer that accurately estimates both solubility and solvation free energy by modelling solutions as collections of molecules with inherent three-dimensional symmetry. Training involved a novel approach using both computationally derived data and experimental measurements, allowing the model to achieve performance comparable to established methods while maintaining strong geometric understanding. The architecture distinguishes itself through its ability to interpret the underlying chemical principles governing solubility.

Attention mechanisms within the model successfully identify intra- and inter-molecular hydrogen bonding patterns, demonstrating that Solvaformer learns physically meaningful relationships rather than simply correlating features. This interpretability, combined with its accuracy, offers a powerful tool for understanding and predicting solution behaviour. Future research will focus on curating datasets enriched with isomeric pairs exhibiting distinct solubilities, aiming to further refine the model’s predictive capabilities and enhance its performance in solubility optimisation tasks.

👉 More information
🗞 Solvaformer: an SE(3)-equivariant graph transformer for small molecule solubility prediction
🧠 ArXiv: https://arxiv.org/abs/2511.09774

Rohail T.

Rohail T.

As a quantum scientist exploring the frontiers of physics and technology. My work focuses on uncovering how quantum mechanics, computing, and emerging technologies are transforming our understanding of reality. I share research-driven insights that make complex ideas in quantum science clear, engaging, and relevant to the modern world.

Latest Posts by Rohail T.:

Nanowire Spin Defects Enable Strain-Controlled Quantum Technologies with 0.83 Precision

Nanowire Spin Defects Enable Strain-Controlled Quantum Technologies with 0.83 Precision

December 23, 2025
Compiler-resistant Obfuscation Advances Quantum Circuit Protection with Minimal Overhead

Compiler-resistant Obfuscation Advances Quantum Circuit Protection with Minimal Overhead

December 23, 2025
Self-oscillatory Dirac Fluids Advance Electron Hydrodynamics, Mirroring Kapitsa Roll Waves

Self-oscillatory Dirac Fluids Advance Electron Hydrodynamics, Mirroring Kapitsa Roll Waves

December 23, 2025