Diffusion Large Language Models Achieve Black-Box Optimisation with Limited Labeled Data

Offline black-box optimisation presents a significant challenge in fields ranging from DNA sequencing to robotics, requiring the identification of optimal solutions from pre-existing datasets. Ye Yuan, Can Chen (from MILA – Quebec AI Institute), and Zipeng Sun (from McGill, MILA – Quebec AI Institute), alongside Dinghuai Zhang and Christopher Pal (from Polytechnique Montreal, Canada CIFAR AI Chair), demonstrate a novel approach utilising diffusion large language models (dLLMs) to overcome limitations of current methods. Their research addresses the difficulty traditional techniques have with capturing bidirectional dependencies within complex designs, instead harnessing the iterative refinement and bidirectional modelling capabilities of diffusion LLMs. By introducing an in-context denoising module and masked diffusion tree search, the team effectively conditions the LLM on available data to generate improved designs, achieving state-of-the-art performance on the design-bench benchmark , paving the way for more efficient and robust optimisation strategies.

LLMs optimise designs from limited offline data

Scientists have demonstrated a novel approach to offline black-box optimization (BBO) by harnessing the power of diffusion large language models (LLMs). This breakthrough research addresses the challenge of finding optimal designs when only a limited offline dataset of designs and their corresponding labels is available, a common scenario in fields like DNA sequence design and robotics. The team achieved significant advancements by moving beyond traditional methods reliant on task-specific proxy or generative models, instead leveraging the in-context learning capabilities inherent in pre-trained LLMs. The study reveals a method that directly generates improved designs from natural language prompts containing task descriptions and offline data, overcoming limitations of earlier autoregressive LLM adaptations which struggled with bidirectional dependencies crucial in many design problems.
This work introduces an in-context denoising module, conditioning a diffusion LLM on both the task description and the offline dataset, both meticulously formatted as natural language prompts. The diffusion LLM is then prompted to iteratively denoise masked designs, transforming them into improved candidate solutions. Experiments show that this approach effectively leverages the bidirectional modeling and iterative refinement capabilities of diffusion LLMs, allowing the model to capture complex dependencies within the design space that left-to-right autoregressive models often miss. The researchers specifically focused on improving performance in scenarios where only a few labeled data points are accessible, a significant hurdle in many real-world optimization problems.

To further enhance the generation process and guide it towards high-performing designs, the team developed masked diffusion tree search. This innovative module casts the denoising process as a step-wise Monte Carlo Tree Search, dynamically balancing exploration and exploitation to efficiently navigate the design space. Each node in the search tree represents a partially masked design, with each denoising step constituting an action. Candidates are rigorously evaluated using expected improvement, calculated under a Gaussian Process trained on the offline dataset, ensuring a robust and data-driven assessment of performance.

The research establishes that their method, dubbed dLLM, achieves state-of-the-art results in few-shot settings on the design-bench benchmark. This signifies a substantial leap forward in BBO performance, particularly in data-scarce environments. The work opens up exciting possibilities for automating the design of complex systems, from optimizing DNA sequences for specific binding affinities to developing more effective robotic control strategies, all without requiring costly and time-consuming online evaluations. The combination of diffusion LLMs and masked tree search provides a powerful framework for tackling challenging optimization problems across a wide range of scientific and engineering disciplines.

Diffusion LLMs for offline black-box optimisation

Scientists pioneered a novel approach to offline black-box optimization (BBO) by harnessing the power of diffusion large language models (LLMs). This work addresses the challenge of finding optimal designs, such as DNA sequences or robotic configurations, when only a limited offline dataset of designs and their performance labels is available. Rather than relying on task-specific proxy models or generative models, the researchers leveraged the in-context learning capabilities of pre-trained LLMs to directly generate improved designs from the existing data. The study recognised that autoregressive LLMs struggle with bidirectional dependencies inherent in many design problems, motivating the exploration of diffusion LLMs with their inherent bidirectional modelling and iterative refinement capabilities.

The core of their method, termed dLLM, centres around an in-context denoising module. The team formatted both task descriptions and the offline dataset as natural language prompts, providing this contextual information to the diffusion LLM alongside an instruction to generate improved designs. Crucially, the researchers prompted the LLM to denoise masked designs, iteratively refining them into promising candidates. This innovative technique allows the model to consider dependencies across the entire design space, overcoming the limitations of left-to-right autoregressive models. The diffusion LLM was conditioned on the task description and offline dataset, both presented in natural language, to guide the denoising process and generate improved candidates.

To further enhance the design generation process, scientists introduced masked diffusion tree search. This method casts the denoising process as a step-wise Monte Carlo Tree Search, dynamically balancing exploration of new design possibilities with exploitation of promising regions. Each node in the search tree represents a partially masked design, with each denoising step constituting an action. Candidate designs are then evaluated using expected improvement, calculated via a Gaussian Process trained on the initial offline dataset. This Gaussian Process provides a predictive model of design performance, enabling the search algorithm to efficiently identify and prioritize high-performing candidates.

Experiments employed the design-bench platform to demonstrate dLLM’s performance. The team achieved state-of-the-art results in few-shot settings, showcasing the effectiveness of their approach in scenarios with limited labelled data. This method achieves significant improvements by effectively combining the strengths of diffusion LLMs with a sophisticated tree search algorithm, enabling the discovery of optimal designs in challenging offline optimization problems.

DLLM excels in few-shot black-box optimisation, achieving state-of-the-art

Scientists achieved state-of-the-art results in offline black-box optimization (BBO) using diffusion large language models (LLMs) The team developed a novel method, dLLM, which leverages the bidirectional modeling and iterative refinement capabilities of diffusion LLMs to discover optimal designs from offline datasets. Experiments revealed that dLLM significantly outperforms existing techniques in few-shot settings on the design-bench benchmark. This breakthrough delivers a new approach to tackling optimization problems in domains like DNA sequence design and robotics, where obtaining labeled data is expensive.

The research focused on addressing the limitations of traditional BBO methods, which often rely on task-specific proxy or generative models. These methods struggle with limited data and fail to fully utilize the in-context learning abilities of pre-trained LLMs. Scientists introduced an in-context denoising module, conditioning the diffusion LLM on task descriptions and offline datasets formatted as natural language prompts. The diffusion LLM then iteratively denoises masked designs, transforming them into improved candidates. Measurements confirm that this process effectively captures bidirectional dependencies crucial in complex design spaces, unlike autoregressive LLMs which operate left-to-right.

To further enhance performance, the team implemented masked diffusion tree search, framing the denoising process as a step-wise Monte Carlo Tree Search. Each node in the search tree represents a partially masked design, with each denoising step constituting an action. Candidates are evaluated using expected improvement, calculated via a Gaussian Process trained on the offline dataset. Tests prove that this dynamic balancing of exploration and exploitation leads to superior design discovery. The method systematically explores the design space, prioritizing promising candidates while still considering less-explored options.

Results demonstrate that dLLM effectively learns from limited offline data, generating high-performing designs without requiring online evaluations. The team measured performance using expected improvement as a reward signal, propagating this information back up the tree to refine the search strategy. Data shows that the combination of in-context denoising and masked diffusion tree search unlocks the full potential of diffusion LLMs for BBO. This work opens up exciting possibilities for automating design processes across various scientific and engineering disciplines, potentially accelerating innovation in areas like drug discovery and materials science.

👉 More information
🗞 Diffusion Large Language Models for Black-Box Optimization
🧠 ArXiv: https://arxiv.org/abs/2601.14446

Rohail T.

Rohail T.

As a quantum scientist exploring the frontiers of physics and technology. My work focuses on uncovering how quantum mechanics, computing, and emerging technologies are transforming our understanding of reality. I share research-driven insights that make complex ideas in quantum science clear, engaging, and relevant to the modern world.

Latest Posts by Rohail T.:

Superluminal Transformations and Finite Limits Incompatible, New No-Go Theorem Achieves Proof

Superluminal Transformations and Finite Limits Incompatible, New No-Go Theorem Achieves Proof

January 24, 2026
Fine-Tuned LLMs Achieve 99% Plausible Counterfactuals for Health Interventions

Fine-Tuned LLMs Achieve 99% Plausible Counterfactuals for Health Interventions

January 24, 2026
Dara Achieves Few-Shot Budget Allocation with RL-Finetuned LLMs for Online Advertising

Dara Achieves Few-Shot Budget Allocation with RL-Finetuned LLMs for Online Advertising

January 24, 2026