Bayesian Transformers Achieve Diverse Intelligence with Sampling from a Single Model

The pursuit of artificial intelligence often focuses on creating single, definitive models, but recent work suggests intelligence arises from the collective of many minds. Diji Yang from University of California Santa Cruz, and Yi Zhang, address this concept by introducing Population Bayesian Transformers, a novel approach that generates diverse yet coherent behaviours from a single set of pre-trained weights. This research moves beyond traditional transformer models by treating key parameters as probabilistic variables, effectively creating a ‘population’ of intelligent agents within one system. The team demonstrates that sampling from this population enhances both exploration and performance across a range of tasks, including zero-shot generation and reinforcement learning, offering a significant step towards more robust and adaptable artificial intelligence.

B-Trans introduces a Bayesian-motivated posterior proxy, treating the bias-like offsets in normalization layers as stochastic variables with a Gaussian variational approximation. This induces a distribution over model behaviour without the cost of training full Bayesian neural networks, and preserves coherence within each generation by freezing the sampled noise at the sequence level.

Uncertainty via Noisy Bias Parameters

This paper proposes a novel approach to improve the reasoning capabilities of Large Language Models (LLMs) by introducing a simple, computationally efficient method for representing uncertainty. Instead of full Bayesian inference, the authors focus on adding noise to the bias terms of normalization layers within the LLM, creating a local proxy for model uncertainty. By sampling different noise configurations, the model effectively behaves as an ensemble, leading to more robust and reliable reasoning. This method significantly improves performance in reinforcement learning scenarios with sparse rewards, suggesting the uncertainty representation helps the model explore more effectively.

The authors add noise to the bias terms of normalization layers during both training and inference, introducing variation in the model’s activations and creating a distribution over possible model behaviors. This allows the model to explore different reasoning paths and make more informed decisions. The approach is computationally inexpensive and easy to implement, making it practical for large-scale LLMs, and validation through controlled experiments confirms the benefits of representing uncertainty even in a simplified form. In essence, the paper argues that even a simple form of uncertainty representation can significantly enhance the reasoning abilities of LLMs, particularly in challenging scenarios like sparse-reward reinforcement learning. The core of this work involves treating bias offsets within normalization layers as stochastic variables, approximated using a Gaussian variational method, which induces a distribution over model behavior without the computational demands of training full Bayesian neural networks. Sampling from this proxy generates instances exhibiting diverse behaviors while maintaining overall competence, effectively creating a “wisdom of crowds” effect within the model. Results demonstrate that aggregating predictions from these sampled instances significantly enhances exploration capabilities, particularly in challenging environments with sparse rewards.

Measurements confirm that B-Trans effectively traverses sparse reward landscapes, achieving deeper exploration than traditional action-space baselines. In a label-free Test-Time Reinforcement Learning setting, the implicit population of B-Trans leverages the wisdom of crowds, outperforming deterministic baselines even without ground-truth supervision. This innovative framework treats the model not as a single entity, but as a population of diverse instances generated from a single set of pre-trained weights, effectively simulating the benefits of collective intelligence. The method introduces a way to represent uncertainty within the model by treating certain parameters as variables, allowing for the sampling of multiple coherent behaviours without the substantial computational cost of fully Bayesian neural networks. The team demonstrates that this population-based approach enhances both the diversity of generated text and performance on challenging tasks, including zero-shot generation and reinforcement learning scenarios with limited feedback. By sampling different model instances and aggregating their predictions, B-Trans leverages a ‘wisdom of crowds’ effect, leading to improved exploration and more robust decision-making. Future research could explore adaptive inference methods, where the level of uncertainty is dynamically adjusted based on the complexity of the input, and ultimately envision a shift towards delivering language models as probabilistic distributions capable of adapting to the specific demands of each query.

👉 More information
🗞 Many Minds from One Model: Bayesian Transformers for Population Intelligence
🧠 ArXiv: https://arxiv.org/abs/2512.25063

Rohail T.

Rohail T.

As a quantum scientist exploring the frontiers of physics and technology. My work focuses on uncovering how quantum mechanics, computing, and emerging technologies are transforming our understanding of reality. I share research-driven insights that make complex ideas in quantum science clear, engaging, and relevant to the modern world.

Latest Posts by Rohail T.:

AI Achieves 99% Accuracy in Hierarchical Classification of Benign Laryngeal Voice Disorders

AI Achieves 99% Accuracy in Hierarchical Classification of Benign Laryngeal Voice Disorders

January 8, 2026
Diffusion Language Models Achieve Optimal Parallel Sampling with Polynomial-Length Chains

Diffusion Language Models Achieve Optimal Parallel Sampling with Polynomial-Length Chains

January 8, 2026
Multi-bandit Best Arm Identification Achieves Efficient Partner Selection for Sequential Support Network Learning

Multi-bandit Best Arm Identification Achieves Efficient Partner Selection for Sequential Support Network Learning

January 8, 2026