QuanDB: A Quantum Chemical Database Enhancing 3D Molecular Learning for Drug and Material Design

Quandb: A Quantum Chemical Database Enhancing 3D Molecular Learning For Drug And Material Design

A team of researchers has developed QuanDB, a high-quality quantum chemical (QC) property database. The database, which currently houses 154,610 compounds, is expected to enhance 3D molecular representation learning, a crucial aspect of determining key properties and intermolecular interactions of molecules. QuanDB provides 53 global and 5 local QC properties and each molecule’s most stable 3D conformation. The database is expected to become a benchmark tool for the training and optimization of machine learning models, advancing the development of novel drugs and materials. QuanDB is freely available without registration.

What is QuanDB?

QuanDB is a high-quality quantum chemical (QC) property database developed by a team of researchers led by Zhejiang Yang, Tengxin Huang, Li Pan, Jingjing Wang, Liangliang Wang, Junjie Ding, and Junhua Xiao. The database contains structurally diverse molecular entities and features a user-friendly interface. It currently houses 154,610 compounds sourced from public databases and scientific literature, with 10,125 scaffolds. The elemental composition of these compounds comprises nine elements: Hydrogen (H), Carbon (C), Oxygen (O), Nitrogen (N), Phosphorus (P), Sulfur (S), Fluorine (F), Chlorine (Cl), and Bromine (Br).

The significance of QuanDB lies in its potential to enhance 3D molecular representation learning. Previous studies have shown that the three-dimensional (3D) geometric and electronic structure of molecules play a crucial role in determining their key properties and intermolecular interactions. Therefore, it is necessary to establish a quantum chemical (QC) property database containing the most stable 3D geometric conformations and electronic structures of molecules.

QuanDB provides 53 global and 5 local QC properties and the most stable 3D conformation for each molecule. These properties are divided into three categories: geometric structure, electronic structure, and thermodynamics.

How Does QuanDB Contribute to Machine Learning and Molecular Design?

QuanDB provides high-value geometric and electronic structure information for use in molecular representation models, which are critical for machine-learning-based molecular design. This contributes to a comprehensive description of the chemical compound space. As a new high-quality dataset for QC properties, QuanDB is expected to become a benchmark tool for the training and optimization of machine learning models, thus further advancing the development of novel drugs and materials.

The fundamental assumption of an AI-assisted molecular drug or material design is that structurally similar molecules have similar properties. A comprehensive molecular representation is crucial for facilitating the discovery of novel molecules. Traditional molecular descriptors require manual feature engineering, making it difficult to comprehensively represent molecules without expert knowledge. Consequently, data-driven representation models are increasingly used to extract unbiased features from molecules.

By utilizing the quantum chemical properties provided by QuanDB, relevant three-dimensional (3D) electronic structural information can be included in comprehensive molecular representation models to facilitate drug and material design. This enhances the performance of downstream tasks such as predicting molecular properties.

How Does QuanDB Compare to Other Similar Databases?

Compared to other similar databases, QuanDB covers a broader space of chemical compounds, adopts a higher level of theoretical calculations, and offers a user-friendly interface. The relationships between the molecular structure and physicochemical (PC) properties, reactivity, and bioactivity are becoming better understood, and researchers are gradually incorporating features that can include the three-dimensional (3D) conformation of molecules in representation models.

The electronic and structural parameters of stable 3D conformations are of particular interest because they critically affect several crucial properties of molecules in 3D space, such as their reactivity, strong electrostatic interactions, and chemical adsorption. Density functional theory (DFT) remains the most reliable and accurate method for obtaining the electronic structure information of the most stable 3D molecular conformations, which can be reflected by quantum chemical (QC) properties.

By incorporating QC properties into the training phase of the molecular representation models, their ability to represent the electronic structural space can be effectively enhanced. Therefore, the construction of a DFT-based QC property database for small organic molecules is of great importance.

What is the Future of QuanDB?

QuanDB is freely available without registration at https://quandb.cmdrg.com. The authors of the study envision QuanDB as a benchmark tool for the training and optimization of machine learning models, thus further advancing the development of novel drugs and materials.

The database is expected to contribute significantly to the field of cheminformatics and machine learning. By providing comprehensive quantum chemical properties of diverse organic molecular entities, all rigorously pretreated and manually cleaned to ensure high accuracy, QuanDB is set to become a valuable resource for researchers and scientists in the field.

The authors also highlight the importance of open access to the database. The article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution, and reproduction in any medium or format, as long as appropriate credit is given to the original authors and the source. This commitment to open access underscores the potential of QuanDB to facilitate collaborative research and innovation in the field of molecular design and machine learning.

Publication details: “QuanDB: a quantum chemical property database towards enhancing 3D molecular representation learning”
Publication Date: 2024-04-29
Authors: Zhiyuan Yang, Tao Huang, Li Pan, Jingjing Wang, et al.
Source: Journal of cheminformatics
DOI: https://doi.org/10.1186/s13321-024-00843-y