Scientists are increasingly focused on understanding why neural networks make the decisions they do, and a new study tackles the challenge of exploring the ‘Rashomon set’ , the multitude of models that can achieve similar accuracy but behave differently. Gilles Eerlings, Brent Zoomers, and Jori Liesenborgs, from the Digital Future Lab at Flanders Make UHasselt, alongside Gustavo Rovelo Ruiz et al, present DIVERSE, a novel framework which efficiently uncovers these diverse yet high-performing models without the need for computationally expensive retraining. By augmenting existing models with Feature-wise Linear Modulation and employing an advanced search strategy, DIVERSE generates functionally distinct variants, offering a competitive and cost-effective method for building robust and well-balanced model sets , a significant step towards more transparent and reliable artificial intelligence systems.
Exploring Rashomon Sets via FiLM and CMA-ES offers
Scientists have unveiled DIVERSE, a novel framework designed to systematically explore the Rashomon set of Deep neural networks, a collection of models achieving comparable accuracy but exhibiting differing predictive behaviours. This innovative approach allows for the creation of multiple high-performing, yet functionally distinct, models, offering a significant advancement in understanding model multiplicity. Experiments conducted across the MNIST, PneumoniaMNIST, and CIFAR-10 datasets demonstrate that DIVERSE provides a competitive and efficient method for exploring the Rashomon set, facilitating the construction of diverse sets that maintain both robustness and performance while supporting well-balanced model multiplicity.
By modulating pre-activations with a latent vector, the framework enables fine-grained adjustments to internal representations without altering the original model weights, allowing for a controlled search of the local hypothesis space. The team successfully delineated the Rashomon set by optimising candidate vectors to induce predictive disagreement with the pre-trained model, all while maintaining accuracy within a predefined margin. This breakthrough establishes a gradient-free method for Rashomon set exploration, combining FiLM-based modulation with CMA-ES, and crucially, requiring no computationally expensive retraining. The study reveals that DIVERSE achieves comparable diversity to traditional retraining methods, but at a significantly reduced computational cost, addressing a key limitation of previous approaches.
Furthermore, the research highlights the potential of this framework to address practical challenges in high-stakes applications, where model multiplicity can raise concerns about fairness and trust in machine learning systems. The work introduces a robust approach to defining empirical Rashomon sets, denoted as Rm ε, representing a finite collection of m models satisfying a performance tolerance ε relative to a reference model. By employing CMA-ES to search the modulation space, DIVERSE efficiently explores the hypothesis space, identifying functionally distinct variants within this tolerance. This innovative methodology offers a valuable tool for assessing model uncertainty, promoting fairness, improving interpretability, and enhancing model flexibility, opening new avenues for research and development in the field of machine learning.
FiLM Modulation and CMA-ES for Network Diversity maximize
Scientists developed DIVERSE, a novel framework for systematically exploring the Rashomon set of neural networks, which represents the collection of models achieving comparable accuracy but exhibiting differing predictive behaviours. Researchers then harnessed the power of Covariance Matrix Adaptation Evolution Strategy (CMA-ES) to navigate a latent modulation space, generating diverse model variants by optimising a latent vector z. Experiments employed datasets including MNIST, PneumoniaMNIST, and CIFAR-10 to rigorously evaluate DIVERSE’s performance.
The team implemented FiLM layers to modulate pre-activations using the latent vector z, allowing for fine-grained adjustments to internal representations while preserving the original model weights, a crucial design choice for computational efficiency. CMA-ES was then utilised to optimise z, inducing predictive disagreement with the pre-trained model while maintaining accuracy within a predefined margin, effectively delineating the Rashomon set. This approach enables the construction of diverse model sets that balance robustness, performance, and multiplicity. The study pioneered a gradient-free method, contrasting with retraining-based approaches which are computationally expensive, and Adversarial Weight Perturbation (AWP) which can struggle with scalability.
DIVERSE achieves comparable diversity to retraining, but at a significantly reduced computational cost, offering a practical solution for exploring the Rashomon set. Measurements focused on quantifying both the diversity and accuracy of the generated model sets, with the aim of constructing well-balanced and robust ensembles. Furthermore, the research defined an empirical Rashomon set, R m ε , as the subset of m models whose empirical risk lies within an ε-tolerance of the reference model’s performance, providing a quantifiable measure of Rashomon set size. The Rashomon Ratio, measuring the fraction of candidates meeting this criterion, served as a scalar metric for assessing diversity. This innovative methodology facilitates a deeper understanding of the hidden flexibility within deep neural networks and addresses the challenges of exploring their vast hypothesis space while maintaining optimal performance.
Diverse Neural Networks via FiLM and CMA-ES enable
Scientists have developed DIVERSE, a novel framework for systematically exploring the Rashomon set of neural networks, the collection of models that achieve comparable accuracy but exhibit differing predictive behaviours. The research details a method for uncovering multiple high-performing, yet functionally distinct, models without requiring computationally expensive retraining or gradient access. Experiments conducted across MNIST, PneumoniaMNIST, and CIFAR-10 demonstrate DIVERSE’s efficiency in constructing diverse model sets that maintain both robustness and performance, alongside well-balanced model multiplicity0.5 to balance soft and hard disagreement metrics. The resulting diversity score, Divλ(z), combines total variation distance and disagreement measures, steering CMA-ES towards models within the Rashomon set while maximizing functional diversity. Measurements confirm that this fitness design effectively captures both decision-level and probability-level variations. Tests on the MNIST dataset, utilising a 3-layer MLP, revealed that DIVERSE successfully identified diverse models even beyond convolutional architectures.
On PneumoniaMNIST, employing an ImageNet-pretrained ResNet-50, the framework achieved comparable diversity to retraining methods, but at a significantly reduced computational cost. Furthermore, experiments on CIFAR-10, using an ImageNet-pretrained VGG-16, demonstrated the applicability of DIVERSE to moderate-scale vision tasks. Reference models achieved accuracies of 98% on the validation/test split for MNIST, 91% test accuracy on PneumoniaMNIST, and 75% test accuracy on CIFAR-10, defining the Rashomon thresholds. Researchers explored latent vector dimensions (d) ranging from 2 to 64, balancing search reach with CMA-ES scaling.
The CMA-ES search protocol used a population size of 4 + 3 log d and allocated a budget of 80 evaluations per latent dimension, resulting in a target budget of 80d total evaluations. This configuration ensured linear scaling of computational effort with the size of the search space. The resulting Rashomon sets were evaluated using metrics including Rashomon Ratio, ambiguity, discrepancy, Variable Prediction Range, and Rashomon Capacity, providing a comprehensive characterization of model multiplicity and diversity.
👉 More information
🗞 DIVERSE: Disagreement-Inducing Vector Evolution for Rashomon Set Exploration
🧠 ArXiv: https://arxiv.org/abs/2601.20627
