Sigma-profiles, crucial molecular descriptors for solvent selection, thermodynamic modelling and molecular design, have long suffered from limitations in size and consistency. Dominik Gond, Justus Arweiler, and Thomas Specht, from the Laboratory of Engineering Thermodynamics at RPTU Kaiserslautern, alongside Hans Hasse and Fabian Jirasek, now address this challenge with the creation of CHAOS, a comprehensive database containing sigma-profiles for over 53,000 molecules. This achievement significantly expands the publicly available data, exceeding previous libraries by more than tenfold, and systematically connects these profiles with a wide range of other molecular descriptors, including geometries, spectra and thermodynamic properties. By employing a standardised computational workflow, the team delivers a consistent and reliable resource that promises to accelerate progress in diverse fields, from fundamental chemistry to advanced materials science, and greatly enhance both physics-based modelling and machine-learning applications.
Large Sigma Profile Database for COSMO Models
This extensive research details the creation and validation of a large database of sigma profiles for use in COSMO-based thermodynamic models. Researchers performed large-scale quantum chemical calculations on over 134,000 molecules, carefully benchmarking different computational methods to achieve optimal accuracy and efficiency. These calculations generated sigma profiles, representing the surface properties of molecules crucial for COSMO-RS predictions, and formed the basis of a comprehensive database incorporating molecular structures and relevant metadata. By making the database open-source, the researchers aim to promote wider adoption and collaboration within the scientific community, addressing a critical need in chemical engineering and computational chemistry for reliable thermodynamic modeling tools.
Standardized Molecular Descriptors From Quantum Calculations
Scientists developed CHAOS, a comprehensive database containing sigma-profiles and related molecular descriptors for 53,091 molecules, addressing a critical need for consistent and extensive data in molecular modeling. The study pioneered a standardized quantum-chemical workflow, employing the ωB97X-D/def2-TZVP level of theory to ensure internal consistency across all calculated properties. This approach systematically minimizes inconsistencies arising from variations in quantum-chemical methods, a common limitation in existing databases. The research team implemented a multi-stage conformational search to identify representative molecular structures, beginning with an initial optimization using the UFF force field, followed by a semi-empirical conformer search utilizing GFN2-xtb, and culminating in refinement with the high-accuracy ωB97X-D/def2-TZVP method.
This rigorous process ensures that the calculated properties accurately reflect the molecule’s true energy minimum, even for complex systems. The database encompasses molecules with molar masses up to 400 amu and dipole moments up to 15 D, expanding the scope of available data significantly beyond previous repositories. Scientists calculated a wide range of molecular descriptors, including structural parameters, electronic properties, vibrational frequencies, NMR shielding constants, and solvation characteristics, all derived consistently from the standardized workflow. The resulting database is freely available on Zenodo under an open license, providing a robust foundation for both physics-based and data-driven modeling approaches across chemistry, chemical engineering, and materials science.
Standardized Molecular Properties Database for 53,091 Molecules
Scientists have created CHAOS, a comprehensive database containing computed molecular properties for 53,091 molecules, significantly expanding the availability of crucial data for chemical research. Researchers generated all data using a rigorous workflow based on ωB97X-D/def2-TZVP density functional theory, ensuring direct comparability across all molecules within the database. The study meticulously calculated gas-phase geometries for each molecule, employing a multi-step procedure that began with distance-geometry embedding and progressed through conformer generation and refinement using both molecular mechanics and semi-empirical methods. Up to 300 conformers were generated per molecule, with redundant or high-energy structures discarded to focus on low-energy, thermally accessible candidates.
Harmonic frequency analysis confirmed that all stationary points are true minima, with any remaining imaginary modes transparently flagged within the data release. Data generated includes dipole moments, both scalar and vectorial, alongside rotational constants A, B, and C, and external symmetry numbers, all derived from harmonic frequency analysis. Scientists also computed gauge-including atomic orbital nuclear magnetic resonance (GIAO) NMR shielding tensors, providing per-atom isotropic shielding and anisotropy data, as well as magnetic susceptibilities. Single-point conductor-like polarizable continuum (C-PCM) calculations yielded tessellated COSMO surfaces and per-segment data, including positions, areas, surface charge densities, and potentials, enabling the construction of sigma-profiles. The database encompasses molecules with molar masses up to 400 amu and dipole moments up to 15 D, providing a diverse dataset for a wide range of applications in chemistry, chemical engineering, and materials science.
Comprehensive Molecular Data for Property Prediction
The researchers have introduced CHAOS, a comprehensive database containing computed molecular properties for 53,091 molecules. This resource provides sigma-profiles, alongside a range of additional data including gas-phase geometries, vibrational spectra, NMR tensors, and solvation energies, all calculated using a standardized and rigorous quantum-chemical protocol. By systematically generating this data at a high level of theory, the team has created a consistent and reliable dataset for molecular modeling and property prediction. CHAOS significantly expands the availability of sigma-profiles, increasing the publicly accessible data by more than tenfold compared to existing collections.
Importantly, it uniquely combines vibrational and solvation data with electronic descriptors, offering a more complete picture of molecular behavior than previously available resources. This broad structural diversity and consistent methodology make CHAOS particularly well-suited for benchmarking computational methods and training data-driven thermodynamic models across chemistry, chemical engineering, and materials science. The authors acknowledge that future work will focus on extending the database to include condensed-phase systems, reactive molecules, and additional solvation models, further enhancing its utility and accessibility for the wider scientific community. The database is freely available under an open license, encouraging collaborative research and innovation in data-driven molecular design.
👉 More information
🗞 CHAOS – A Consistent Large-scale Database for Sigma-Profiles and Other Molecular Descriptors
🧠 ArXiv: https://arxiv.org/abs/2511.19002
