Surrogates are becoming indispensable tools in engineering for accelerating computationally expensive simulations, however performance often suffers when discrepancies arise between training and real-world deployment conditions. Anna Zimmel, Paul Setinek, and Gianluca Galletti, working with colleagues at the 1ELLIS Unit, LIT AI Lab, and Institute for Machine Learning, JKU Linz, Austria, have developed a novel Test-Time Adaptation (TTA) framework to address this critical issue. Their research introduces a method based on storing maximally informative D-optimal statistics, enabling stable adaptation and parameter selection for high-dimensional, unstructured regression problems, a significant advancement over existing TTA techniques primarily designed for lower-dimensional classification. Demonstrating up to 7% improvement in out-of-distribution performance with minimal computational overhead on benchmarks such as SIMSHIFT and EngiBench, this work represents the first systematic demonstration of effective TTA for high-dimensional simulation regression and generative design optimisation.
Scientists are tackling a major hurdle in engineering design: ensuring computer simulations remain accurate when faced with unexpected scenarios. Accurate and swift simulations rely on ‘surrogates’, but these can falter with new data. A clever adaptation technique now promises to keep these vital tools reliable, even when conditions change during use.
Scientists are increasingly deploying machine learning surrogates to speed up complex engineering simulations, yet a significant challenge arises when these models encounter conditions differing from their original training data. These distribution shifts, such as unseen geometries or configurations, can lead to substantial performance drops, hindering the reliability of predictions.
Test-Time Adaptation (TTA) offers a potential solution by allowing models to adjust during use, but current TTA methods are largely designed for simpler tasks with clear visual patterns and structured outputs. This limitation creates instability when applied to the high-dimensional, unstructured regression problems frequently found in engineering simulations.
Researchers have now developed a new TTA framework that addresses this instability by storing and utilising maximally informative statistics, specifically employing a D-optimal approach to select the most relevant data. This method enables stable adaptation and allows for automated selection of optimal parameters during testing. When integrated with pre-trained simulation surrogates, the work yields improvements of up to 7% in out-of-distribution performance with minimal added computational cost.
This represents, to the best of the authors’ knowledge, the first systematic demonstration of effective TTA for high-dimensional simulation regression and generative design optimisation, validated using the SIMSHIFT and EngiBench benchmarks. Neural surrogates have become essential tools for accelerating Partial Differential Equation (PDE) simulations across numerous scientific and engineering disciplines.
While these surrogates perform well when test conditions align with the training data, their accuracy often diminishes when faced with unseen configurations, variations in geometry, material properties, or structural dimensions. This issue becomes particularly acute in industrial settings where design optimisation processes generate configurations that extend beyond the initial training ranges.
Access to the original training data is often restricted due to portability or proprietary concerns, necessitating model- and task-agnostic approaches for zero-shot adaptation and automated model selection. Addressing distribution shifts is a central theme in several research areas, including domain adaptation, domain generalisation, meta-learning, and active learning.
Test-Time Adaptation (TTA) stands out as a particularly suitable approach for engineering tasks requiring rapid adaptation and where target domain distributions are unknown beforehand, as it adapts models during inference without needing source data or incurring significant computational overhead. Although TTA has proven effective in fields like medical imaging and object detection, its application to high-dimensional regression problems remains largely unexplored.
Performance gains of Stable Adaptation at Test-Time for Simulation across diverse robotic datasets
Across all simulation datasets, Stable Adaptation at Test-Time for Simulation (SATTS) consistently outperforms existing Test-Time Adaptation (TTA) methods, yielding improvements of up to 7% in out-of-distribution performance with minimal computational overhead. Specifically, on the rolling model dataset, SATTS achieves a Root Mean Squared Error (RMSE) of 0.545±0.019, compared to 0.566±0.020 for SSA and 1.825±0.002 for Tent, demonstrating a clear advantage in prediction accuracy.
The motor model showcases even more substantial gains, with SATTS matching the Oracle performance at 0.109±0.003 RMSE, while the source model stands at 0.109±0.001 and Tent significantly underperforms at 1.132±0.032. For the forming model, SATTS attains an RMSE of 0.157±0.001, a slight improvement over the source model’s 0.161±0.001 and notably better than SSA’s 0.215±0.005.
The heatsink model presents a more nuanced picture, with SATTS achieving 0.738±0.004 RMSE, a small reduction from the source model’s 0.747±0.001, but still a marked improvement over Tent’s 0.876±0.001. These results, averaged across 20 TTA runs, establish SATTS as a new baseline for adapting simulation surrogates to unseen conditions. Visual inspection of Equivalent Plastic Strain (PEEQ) predictions on a hot rolling sample further confirms SATTS’s effectiveness, successfully correcting systematic under-predictions in deformation zones, indicating improved physical consistency with ground truth data.
Furthermore, analysis on the EngiBench generative design optimisation tasks reveals comparable success. On the Beams2D model, SATTS achieves a COMP score of 118.8±12.409, slightly better than the source model’s 123.7±17.854 and SSA’s 119.4±4.586, while the HeatConduction2D model shows SATTS attaining a COMP score of 0.537±0.491, again outperforming the source model at 0.577±0.561 and SSA at 0.712±0.615. Proxy A-Distance (PAD) values, quantifying the discrepancy between source and target domains, correlate with performance gains; datasets with higher PAD values consistently demonstrate improvements from adaptation, reinforcing the method’s ability to address significant distribution shifts.
D-optimal adaptation of simulation surrogates for out-of-distribution generalisation
A D-optimal experimental design strategy underpinned the methodology used to adapt pretrained simulation surrogates to unseen data distributions. This technique, borrowed from statistics, selects a minimal set of data points that maximise information gain about the surrogate model’s behaviour. Rather than randomly choosing samples for adaptation, the research team carefully curated a subset that best constrains the model’s parameters, improving stability and accuracy when faced with out-of-distribution inputs.
This contrasts with typical test-time adaptation methods which often struggle with the high dimensionality and complex relationships inherent in engineering simulations. Initially, the work involved establishing a baseline performance for several existing surrogates trained on benchmark datasets, namely SIMSHIFT and EngiBench. These benchmarks represent a range of engineering problems, allowing for a thorough evaluation of the adaptation framework.
Once baseline performance was quantified, the D-optimal statistics were computed from a representative set of training data. These statistics capture the essential characteristics of the surrogate’s input-output mapping, providing a concise summary of its behaviour. Subsequently, during test time, incoming data points were compared to the stored D-optimal statistics.
This comparison guided the selection of appropriate adaptation parameters, effectively shifting the surrogate’s predictions to better match the new data distribution. The process avoids full retraining, offering a computationally efficient solution for adapting to changing conditions. By focusing on informative statistics, the method circumvents instabilities common in high-dimensional regression problems, where small perturbations can lead to large errors.
Stabilising surrogate models against unforeseen changes in engineering design optimisation
Scientists increasingly rely on computer simulations to design everything from aircraft wings to heat sinks, yet these simulations are often slow and computationally expensive. To speed things up, engineers build ‘surrogate’ models, essentially fast approximations of the full simulation. However, these surrogates falter when faced with scenarios slightly different from those used during their training, a problem known as distribution shift.
This mismatch can render the surrogate useless, negating any time saved. Adapting these surrogates to new conditions has proven surprisingly difficult. Existing methods, designed for simpler tasks like image classification, struggle with the high-dimensional and often unpredictable outputs of engineering simulations. Now, a new approach demonstrates a way to stabilise this adaptation process by intelligently storing and applying key statistical information.
By focusing on the most informative data points, the system can adjust to unseen conditions with minimal computational overhead, improving performance by up to seven percent in benchmark tests. This work represents the first systematic demonstration of effective ‘test-time adaptation’ for complex simulation regression, opening doors for generative design optimisation where algorithms automatically explore countless possibilities.
While the current implementation relies on specific network architectures, the underlying principle, using carefully selected statistics to guide adaptation, could be applied to a wider range of surrogate models. Further research must address the limits of this approach when faced with extreme distribution shifts, and explore how to automate the selection of these ‘informative’ statistics. Once these challenges are met, we can anticipate a future where simulations are not just faster, but also far more adaptable and reliable.
👉 More information
🗞 Stabilizing Test-Time Adaptation of High-Dimensional Simulation Surrogates via D-Optimal Statistics
🧠 ArXiv: https://arxiv.org/abs/2602.15820
