Exploring vast and complex search spaces presents a significant challenge for optimisation algorithms, particularly when traditional gradient-based methods fail. Nicolas Menet, Aleksandar Terzić, and Andreas Krause from ETH Z ̈urich, alongside Abbas Rahimi from IBM Research, Zurich, now present a novel approach to Thompson sampling that overcomes this limitation. Their work introduces a scalable method, Thompson Sampling via Fine-Tuning, which avoids computationally expensive optimisation of acquisition functions by directly learning to predict the probability of a candidate solution being optimal. This innovative technique leverages the power of large language models, adapting them incrementally to reflect the evolving understanding of the search space, and importantly, the team demonstrates that this method achieves strong theoretical performance guarantees while also improving efficiency in practical applications ranging from refining question answering systems to designing stable proteins and electronic circuits.

Bayesian Optimisation and Reinforcement Learning Frameworks

This document details research into Bayesian optimisation, a method for efficiently finding the best solution to complex problems, and reinforcement learning, a framework for training agents to make optimal decisions. The work focuses on optimising functions where the internal workings are unknown, utilising statistical modelling and algorithms to improve performance. Researchers explore techniques to reduce the uncertainty of predictions and enhance the efficiency of optimisation processes, particularly in reinforcement learning scenarios. Central to the research is the exploration of exploration-exploitation trade-offs, a fundamental challenge in optimisation where algorithms must balance trying new possibilities with refining existing solutions. Techniques like leave-one-out estimation and policy gradient methods are employed to refine strategies and improve decision-making. The work demonstrates a strong emphasis on statistical rigor, with detailed derivations of key statistical quantities and a focus on understanding the properties of estimators and models.

Thompson Sampling via Language Model Fine-Tuning

Researchers have developed Thompson Sampling via Fine-Tuning (ToSFiT), a novel approach to Bayesian optimisation addressing the computational challenges of optimising in large, unstructured spaces. This method leverages the power of large language models to efficiently explore potential solutions, transforming the optimisation problem into a learning task for the model. ToSFiT builds upon existing techniques by initialising the optimisation process with a pre-trained language model, providing a strong starting point and accelerating learning. The team derived a theoretical result demonstrating how the cumulative regret scales with the complexity of the search space, highlighting the importance of both pre-training and careful fine-tuning. Experiments across diverse tasks, including refining FAQ responses, designing thermally stable proteins, and creating quantum circuits, validate the effectiveness of ToSFiT, demonstrating significant improvements in sample efficiency with minimal impact on computational cost. This approach effectively scales Bayesian optimisation to complex, high-dimensional domains where traditional methods are impractical.

Bayesian Optimization via Large Language Models

Researchers have developed Thompson Sampling via Fine-Tuning (ToSFiT), a new method for Bayesian optimisation in complex, unstructured spaces. This approach overcomes limitations caused by the computational cost of maximising acquisition functions by directly parameterising the probability of finding the best solution, leveraging the strong prior knowledge embedded in large language models and incrementally adapting them towards the most promising solutions. Experiments demonstrate that ToSFiT scales effectively by utilising a mathematical technique that allows for efficient computation and memory usage, independent of the number of trials. The team derived explicit equations describing the optimisation process, providing new insights into its behaviour and enabling efficient fine-tuning of the language model. Techniques borrowed from reinforcement learning were used to stabilise the learning process and ensure consistent improvement. Validation across diverse tasks, including refining FAQ responses, designing thermally stable proteins, and creating quantum circuits, demonstrates the effectiveness of ToSFiT, confirming that it matches the strong guarantees of standard Thompson sampling.

Scalable Thompson Sampling with Generative Policies

This work demonstrates a scalable approach to Thompson sampling, enabling efficient optimisation in large, unstructured spaces. By parameterising the probability of finding the best solution with a generative model, researchers have overcome the computational challenges associated with maximising acquisition functions. Theoretical analysis confirms that this method achieves a performance level comparable to standard Thompson sampling and other established techniques. Empirical validation across diverse tasks, including refining FAQ responses, designing thermally stable proteins, and creating quantum circuits, demonstrates that this new method consistently outperforms baseline approaches. The technique effectively avoids getting stuck in suboptimal solutions and extends naturally to processing multiple trials simultaneously, showcasing its practical applicability to complex problems. Future research will focus on jointly learning task-adaptive embeddings, exploring more expressive reward models, and restricting updates to only a portion of the generative model to reduce computational cost, establishing the potential of combining foundation models with principled Bayesian optimisation for tackling challenging discrete search problems.

👉 More information
🗞 Thompson Sampling via Fine-Tuning of LLMs
🧠 ArXiv: https://arxiv.org/abs/2510.13328

Tags:

acquisition functions Bayesian Optimization discrete spaces FAQ response refinement Large Language Models online fine-tuning posterior probability regret bounds Sample Efficiency Thompson sampling

Thompson Sampling Via Fine-Tuning of LLMs Achieves Scalable Bayesian Optimization Without Acquisition Function Maximization

Bayesian Optimisation and Reinforcement Learning Frameworks

Thompson Sampling via Language Model Fine-Tuning

Bayesian Optimization via Large Language Models

Scalable Thompson Sampling with Generative Policies

Rohail T.

Latest Posts by Rohail T.:

Create Achieves 30% Resilience Gain for Efficient Embodied AI Systems

Quantum Neural Networks Achieve Faster Gravitational Wave Data Analysis with 4 Qubits

Two-stream Transformer Achieves Excellent Video Action Classification on Three Datasets