Researchers Develop MEMS for Improved Predictions in Chemical Machine Learning

Molecular representation is crucial in chemical machine learning, influencing model complexity and training data fulfillment. Researchers Tonglei Li, Nicholas J Huls, Shan Lu, and Peng Hou have developed a lower-dimensional representation of a molecular manifold, the Manifold Embedding of Molecular Surface (MEMS), to better understand and predict molecular interactions. MEMS, which embodies surface electronic quantities, can be used as input for chemical learning. The team’s solubility prediction with MEMS demonstrated its potential for both shallow and deep learning by neural networks. The method also addresses challenges in data-driven chemical learning, such as molecular structure complexity and high dimensionality of molecular descriptors.

What is the Importance of Molecular Representation in Chemical Machine Learning?

Molecular representation plays a pivotal role in chemical machine learning. It is the determining factor in the complexity of model development and the fulfillment of training data. The goal is to avoid either overfitting or underfitting the model. The electronic structures and associated attributes of molecules are the root cause for molecular interactions and their manifested properties. Therefore, understanding these structures and attributes is crucial for predicting molecular interactions.

The researchers, Tonglei Li, Nicholas J Huls, Shan Lu, and Peng Hou, have sought to examine the local electron information on a molecular manifold to understand and predict molecular interactions. Their efforts led to the development of a lower-dimensional representation of a molecular manifold, known as Manifold Embedding of Molecular Surface (MEMS). This method embodies surface electronic quantities and retains the chemical intuition of molecular interactions.

MEMS can be further featurized as input for chemical learning. The researchers’ solubility prediction with MEMS demonstrated the feasibility of both shallow and deep learning by neural networks. This suggests that MEMS is expressive and robust against dimensionality reduction.

How is a Molecule Encoded in Chemical Learning?

In chemical learning, a molecule is encoded in a computable format to develop quantitative structure-activity or structure-property relationships (QSAR or QSPR) by a machine learning model. A molecule is often represented as an assembly or set of numerical descriptors such as molecular weight, dipole moment, and number of single bonds.

The conventional depiction of a molecule as a graph of nodes and lines signifying atoms and bonds has initiated various description or fingerprinting schemes such as SMILES and ECFP. A descriptor is generally of a 1D, 2D, or 3D feature of a molecule. The elemental composition and chemical connectivity may also be encoded as a fingerprint or alphanumeric string.

Benchmarking studies have been conducted to show one representation outperforms another. In principle, as long as it could fully differentiate molecules in a molecular dataset, a set of descriptors, a graph representation, or a fingerprint would assume a one-to-one connection or function with the molecular property of interest, which could be approximated by machine learning.

What are the Challenges in Data-Driven Chemical Learning?

There remain two interweaved challenges when applying a molecular description in data-driven chemical learning. The first one stems from the complexity of the molecular structure and the second one from the high dimensionality of the molecular descriptors.

The complexity of the molecular structure refers to the fact that molecules are not simple objects. They are composed of atoms that are connected in a specific way, forming a unique structure. This structure determines the properties of the molecule, and therefore, it is crucial to accurately represent it in the machine learning model.

The high dimensionality of the molecular descriptors refers to the large number of features that are used to describe a molecule. These features can include various physical and chemical properties, such as molecular weight, dipole moment, and number of single bonds. The high dimensionality can lead to overfitting, where the model learns the training data too well and performs poorly on new, unseen data.

To overcome these challenges, the researchers developed the Manifold Embedding of Molecular Surface (MEMS), a lower-dimensional representation of a molecular manifold. This method embodies surface electronic quantities and retains the chemical intuition of molecular interactions, making it a promising approach for data-driven chemical learning.

Publication details: “Unsupervised manifold embedding to encode molecular quantum information for supervised learning of chemical data”
Publication Date: 2024-06-11
Authors: Tonglei Li, Nicholas J. Huls, Shan Lu, Peng Hou, et al.
Source: Communications chemistry
DOI: https://doi.org/10.1038/s42004-024-01217-z

Quantum News

Quantum News

As the Official Quantum Dog (or hound) by role is to dig out the latest nuggets of quantum goodness. There is so much happening right now in the field of technology, whether AI or the march of robots. But Quantum occupies a special space. Quite literally a special space. A Hilbert space infact, haha! Here I try to provide some of the news that might be considered breaking news in the Quantum Computing space.

Latest Posts by Quantum News:

Diffraqtion Secures $4.2M Seed to Build Quantum Camera Satellite Constellations

Diffraqtion Secures $4.2M Seed to Build Quantum Camera Satellite Constellations

January 13, 2026
PsiQuantum & Airbus Collaborate on Fault-Tolerant Quantum Computing for Aerospace

PsiQuantum & Airbus Collaborate on Fault-Tolerant Quantum Computing for Aerospace

January 13, 2026
National Taiwan University Partners with SEEQC to Advance Quantum Electronics

National Taiwan University Partners with SEEQC to Advance Quantum Electronics

January 13, 2026