Scientists are increasingly reliant on Simulation-Based Inference for complex data analysis, yet profiling systematic uncertainties remains a substantial computational challenge. Davide Valsecchi, Mauro Donegà, and Rainer Wallny, all from the D-PHYS Department at ETH Zurich, address this limitation with a novel framework for efficiently profiling nuisance parameters while simultaneously measuring multivariate Distributions of Interest. Their research introduces Factorizable Normalizing Flows to model systematic variations, offering a tractable approach to parameter estimation without the combinatorial complexities of traditional methods. This work is significant because it enables amortized training, capturing the conditional dependence of the Distribution of Interest on nuisance parameters within a single optimisation process, and ultimately facilitating unbinned, functional measurements in fields such as high-energy physics.
This innovative approach enables more precise measurements from experiments such as those conducted at the Large Hadron Collider. The research introduces a method for efficiently profiling nuisance parameters while simultaneously measuring multivariate distributions of interest, defined as learnable transformations of the data.
By overcoming limitations in existing techniques, this work paves the way for more detailed and accurate analyses of complex experimental data. This preserves computational tractability, avoiding the combinatorial explosion often encountered in complex statistical analyses.
Crucially, the researchers have implemented an amortized training strategy that learns the conditional dependence of the distribution of interest on nuisance parameters within a single optimisation process. This bypasses the need for repetitive training, a significant improvement over previous methods. This new framework demonstrates the ability to simultaneously extract underlying distributions and robustly profile nuisance parameters, a substantial leap forward in the field.
The method was validated using a synthetic dataset designed to emulate a high-energy physics measurement with multiple systematic sources. Results confirm its potential for performing unbinned, functional measurements in complex analyses, offering a powerful tool for future research. The study successfully profiled nuisance parameters while measuring multivariate Distributions of Interest, defined as learnable invertible transformations of the feature space.
This approach allows for functional measurements in complex analyses, moving beyond the limitations of estimating only scalar parameters. Validation on a synthetic dataset, designed to emulate a high-energy physics measurement with multiple systematic sources, confirmed the method’s efficacy. This new strategy offers a substantial improvement in computational efficiency for unbinned likelihood fits, a critical bottleneck in high-energy physics data analysis.
By amortizing the training process, the research avoids the combinatorial explosion of computational cost often associated with profiling numerous nuisance parameters. This design choice ensures tractability when modelling high-dimensional systematic uncertainties commonly encountered in high-energy physics.
To address the challenge of estimating multivariate Distributions of Interest (DoI), the research team defined these as learnable, invertible transformations of the feature space. This formulation allows the method to focus on directly modelling the relevant distribution, rather than attempting to infer it indirectly from complex simulations. Crucially, the DoI is not treated as a fixed quantity but is instead allowed to vary depending on nuisance parameters representing systematic uncertainties.
This conditional dependence is central to the amortized training strategy developed within the study. An amortized training strategy was implemented to learn this conditional dependence in a single optimisation process, circumventing the need for repetitive training typically required when profiling nuisance parameters. Traditional methods often involve iteratively re-training the model for each value of the nuisance parameter during a likelihood scan, which is computationally expensive.
Instead, the amortized approach learns a mapping from the nuisance parameters to the parameters of the DoI, effectively “amortizing” the cost of training over the entire parameter space. This is achieved by constructing a neural network that predicts the DoI’s parameters given the values of the nuisance parameters. The method was validated using a synthetic dataset designed to emulate a high-energy physics measurement with multiple systematic sources.
This dataset allowed for controlled testing of the framework’s ability to simultaneously extract the underlying distribution and robustly profile nuisance parameters, demonstrating its potential for unbinned, functional measurements in complex analyses. The synthetic data generation incorporated realistic features of particle physics experiments, including detector effects and background processes, ensuring the validation was relevant to real-world applications.
The Bigger Picture
Scientists have devised a new computational framework that promises to unlock greater precision in the analysis of data from particle colliders and beyond. For years, the sheer complexity of modern experiments, particularly those at facilities like the Large Hadron Collider, has presented a significant hurdle. Extracting meaningful signals from the noise requires accounting for a multitude of uncertainties, a process that rapidly becomes computationally intractable as the number of variables increases.
This work offers a way to efficiently map and manage those uncertainties, effectively streamlining a critical bottleneck in high-energy physics. These tools allow researchers to simultaneously determine the underlying distributions of interest and accurately profile nuisance parameters, those variables that introduce uncertainty but aren’t the primary focus of the analysis.
Previous methods often required repetitive, time-consuming retraining for each adjustment of these nuisance parameters. This new approach represents a substantial leap forward in efficiency. However, the path to widespread adoption isn’t without its challenges. While demonstrated on a synthetic dataset mirroring a high-energy physics measurement, the true test will be its performance with real, complex data.
Scaling these methods to encompass the full dimensionality of collider events remains a considerable undertaking. Furthermore, the reliance on machine learning introduces questions about robustness and potential biases. Future work will likely focus on validating the framework against diverse datasets and exploring ways to mitigate these risks. Ultimately, this development isn’t just about faster data analysis; it’s about pushing the boundaries of what’s measurable, potentially revealing subtle signals that would otherwise remain hidden within the statistical noise.
👉 More information
🗞 Profiling systematic uncertainties in Simulation-Based Inference with Factorizable Normalizing Flows
🧠 ArXiv: https://arxiv.org/abs/2602.13184
