Researchers Calibrate Force Estimates with 96% Accuracy

Scientists are increasingly reliant on accurate uncertainty quantification in interatomic potentials to model material behaviour effectively. Moritz Schäfer, Matthias Kellner, and Johannes Kästner, in collaboration with Michele Ceriotti and colleagues from the Laboratory of Computational Science and Modeling at the Ecole Polytechnique Fédérale de Lausanne, Switzerland, and the Institute for Theoretical Chemistry at the University of Stuttgart, Germany, present a systematic investigation into training strategies for shallow ensembles. Their research addresses a critical need to balance calibration performance with computational cost, demonstrating that optimising a negative log-likelihood loss improves calibration over random ensembles or Laplace approximations. Crucially, the team reveal that explicitly modelling force uncertainties is essential for reliable results, and validate an efficient fine-tuning protocol which reduces training time by up to 96% without compromising calibration quality across diverse materials including carbon, liquids, and peptides, establishing practical guidelines for robust atomistic simulations.

Accurate simulation of materials relies on knowing not just what will happen, but how confident we are in that prediction. A new training method dramatically speeds up the process of building reliable models, cutting training time by as much as 96 per cent and offering a practical route to better materials design and discovery. Scientists are investigating training strategies for shallow ensembles to balance calibration performance with computational cost in machine learning interatomic potentials.

This approach is computationally efficient because the different ensemble members share a large part of the model weights. Researchers demonstrate that explicit optimisation of a negative log-likelihood (NLL) loss improves calibration with respect to approaches based on ensembles of randomly initialised models, or on a last-layer Laplace approximation. Models trained solely on energy objectives yield miscalibrated force estimates, and explicitly modelling force uncertainties via an NLL objective is essential for reliable calibration.

Calibration gains from negative log-likelihood optimisation and efficient fine-tuning of interatomic potentials

Negative log-likelihood (NLL) optimisation improves calibration performance in shallow ensembles of interatomic potentials compared to randomly initialised models or last-layer Laplace approximations. Initial results revealed that models trained exclusively on energy objectives produce miscalibrated force estimates, highlighting the necessity of explicitly modelling force uncertainties.

Incorporating an NLL objective for force uncertainties is essential for reliable calibration, though this initially presented a considerable computational burden. A full-model fine-tuning protocol, starting from a shallow ensemble initially trained with a probabilistic energy loss or sampled from the Laplace posterior, reduced training time by up to 96% without substantially compromising calibration quality.

Achieving this efficiency required careful consideration of training strategies, and research validated that fine-tuning a pre-trained ensemble offers a practical alternative to training from scratch, maintaining calibration while drastically reducing computational demands. Across diverse materials, amorphous carbon, ionic liquids (BMIM), liquid water (H2O), barium titanate (BaTiO3), and a model tetrapeptide (Ac-Ala3-NHMe) , this protocol consistently delivered reliable uncertainty quantification.

The ability to propagate predicted model uncertainties to derived quantities is now more readily achievable, providing a pathway toward trustworthy interpretation of ML-accelerated simulations relying on both accurate point estimates and well-calibrated uncertainty estimates. The DPOSE scheme, a direct propagation of shallow ensembles, strikes a balance between accuracy and evaluation cost, with low implementation complexity.

Calibrated uncertainties are essential for production simulations, where they can flag unreliable results or be propagated through downstream workflows. Unlike traditional full ensembles, DPOSE shares model weights, increasing computational efficiency. Within this framework, the probabilistic Gaussian negative log-likelihood loss function integrates uncertainty awareness into the training procedure, ensuring well-calibrated uncertainty estimates.

Efficient Uncertainty Quantification via Fine-tuning of Probabilistic Machine Learning Potentials

Scientists typically incur a significant computational overhead when performing uncertainty quantification. An efficient protocol involves full-model fine-tuning of a shallow ensemble originally trained with a probabilistic energy loss, or one sampled from the Laplace posterior, resulting in negligible reduction in calibration quality compared to training from scratch, while reducing training time by up to 96%.

This protocol was evaluated across a diverse range of materials, including amorphous carbon, ionic liquids (BMIM), liquid water (H2O), barium titanate (BaTiO3), and a model tetrapeptide (Ac-Ala3-NHMe), establishing practical guidelines for reliable uncertainty quantification in atomistic machine learning. Introducing machine learning (ML) surrogate models and machine learning interatomic potentials (MLIPs) into first-principles atomistic modelling workflows should be approached with care.

Most ML models originate in statistical learning frameworks and therefore introduce additional uncertainty in their predictions, stemming from limited knowledge, irreducible noise in the training data, and the inability of the chosen ML architecture to capture complex atomistic environments. These uncertainties can impact the accuracy of atomistic modelling.

Accurate interatomic potentials are central to atomistic modelling, enabling the connection between microscopic structure and macroscopic observables via statistical physics. MLIPs trained on quantum chemical reference data describe potential energy surfaces with high accuracy while significantly reducing computational cost. Unlike the underlying electronic structure methods (e.g., DFT), MLIPs typically achieve linear scaling with system size by decomposing the total potential energy E(A, θ) into atomic contributions ε(Ai, θ): E(A, θ) = Natoms X i ε(Ai, θ).

These atomic contributions depend on the local atomic environment Ai and the learnable model parameters θ. The atomic forces F i required for molecular dynamics are obtained analytically as the negative gradient of this potential energy surface with respect to atomic positions ri: F i(A, θ) = −∇riE(A, θ). While architectures differ in how they encode Ai, the models used in this work share some common architectural motifs.

First, interatomic distances are expanded within a spherical cutoff using radial basis functions. The radial two-body information may be augmented by angular and many-body descriptors and (non-)linear transformations to enhance the representational fidelity needed to represent complex atomistic environments. Crucially for the uncertainty quantification methods discussed later, the final atomic energy is predicted by mapping this learned feature representation hi through a linear readout layer: ε(Ai, θ) = w⊤hi + b.

In this work, Gaussian Moment Neural Networks (GMNN) were used through their efficient implementation in the apax package, with some additional experiments using So3krates and a NequIP-style potential, referred to as EquivMP, on selected datasets to test the transferability of the findings to different model architectures. A more detailed description of the three architectures can be found in SI Section S1.

All trainings and model evaluations were performed with IPSuite. To quantify the uncertainty of a model’s predictions, the approach moves from single-point estimates to a probabilistic framework. In a rigorous Bayesian treatment, the model parameters are random variables, and the predictive distribution for a target y is obtained by marginalizing over these parameters: p(y|A, D) = ∫ p(y|A, θ)p(θ|D)dθ.

Most practical applications rely on approximations to this integral. Ensemble methods provide a Monte Carlo estimate of the posterior predictive distribution. Ensembles consisting of independently trained models are a common approach, with the predicted mean given by = 1/N Σi=1N yi, and the variance by σ2 = 1/N Σi=1N (yi − )2. We recently introduced the “direct propagation of shallow ensembles” (DPOSE) scheme, an ensemble-based ML model UQ scheme, striking a good balance between accuracy and evaluation cost, which can be applied to any architecture and has low implementation complexity.

DPOSE reduces the overhead of traditional full ensembles by sharing the model backbone and ensembling only the last layer, with a joint training procedure of all ensemble members simultaneously, using a probabilistic Gaussian negative log-likelihood (NLL) loss function. NLL(∆y, σ) = 1/2 [∆y2/σ2 + ln(2πσ2)], which is summed over the training set and depends on the empirical error ∆y and the predicted ensemble variance σ2 of each sample.

In this formulation, all parameters are jointly optimised, allowing uncertainty information to propagate into the deeper layers during training. For forward properties like energy, the computational overhead is simply that of a larger last-layer readout. As an analytic alternative, the LLPR framework applies a Laplace approximation to the posterior of the last layer based on a single model trained with an MSE loss.

MSE(∆y) = 1/N Σi=1N (∆yi)2. The parameters at the loss minimum correspond to the maximum a posteriori (MAP) estimate θMAP. Assuming that θMAP is a local minimum of the loss function, a second-order Taylor expansion of the posterior distribution p(θ|D) around θMAP yields: p(θ|D) ≈ N(θ|θMAP; ΣLL). The covariance ΣLL is the inverse Hessian of the loss with respect to the parameters at the MAP solution. For the last-layer weights, the covariance ΣLL is approximated as ΣLL ≈ α2(H + η2I)−1 w .

👉 More information
🗞 How to Train a Shallow Ensemble
🧠 ArXiv: https://arxiv.org/abs/2602.15747

Rohail T.

Rohail T.

As a quantum scientist exploring the frontiers of physics and technology. My work focuses on uncovering how quantum mechanics, computing, and emerging technologies are transforming our understanding of reality. I share research-driven insights that make complex ideas in quantum science clear, engaging, and relevant to the modern world.

Latest Posts by Rohail T.:

Black Hole Quirks Reveal Orbital Frequency Shifts

Black Hole Quirks Reveal Orbital Frequency Shifts

February 20, 2026
Quantum State Transfer Occurs Via Environment’s ‘memory’

Quantum State Transfer Occurs Via Environment’s ‘memory’

February 20, 2026
Fibre Optic Source Emits Paired Light at Two Wavelengths

Fibre Optic Source Emits Paired Light at Two Wavelengths

February 20, 2026