Researchers are tackling a critical challenge in black-box optimisation , designing effective reward functions for algorithms that learn to optimise themselves. Zechuan Huang, Zhiguang Cao, and Hongshu Guo, from the South China University of Technology and Singapore Management University, alongside Yue-Jiao Gong and Zeyuan Ma, present a novel approach utilising large language models to automate this reward discovery process. Their work addresses the inherent biases and potential vulnerabilities of human-designed rewards, instead employing an evolutionary process guided by an LLM to continuously refine reward functions. Significantly, the team introduces a multi-task evolution architecture, enabling parallel reward discovery and accelerating convergence, ultimately demonstrating substantial performance gains for existing meta-optimisation algorithms and highlighting the crucial role of reward design in this field.
LLMs automate reward design for MetaBBO, improving optimization
Scientists have unveiled a novel framework, READY, which leverages Large language models (LLMs) to automate the design of reward functions for Meta-Black-Box Optimization (MetaBBO). This breakthrough addresses a critical limitation in the field, where current methods rely heavily on human expertise to craft reward signals, introducing potential biases and hindering scalability. The research establishes a new paradigm where LLMs function as intelligent evolutionary operators, autonomously synthesising reward functions tailored for diverse MetaBBO approaches. Specifically, the team introduced a multi-task evolution architecture, enabling parallel reward discovery and accelerating convergence through knowledge sharing across different MetaBBO tasks.
The study’s core innovation lies in its tailored evolution paradigm within an iterative LLM-based program search process, ensuring continuous improvement in reward function design. Researchers developed five reflection-based code-level evolutionary operators to promote diverse program search behaviours, moving beyond simple heuristic approaches. Furthermore, an explicit knowledge transfer scheme was implemented to enhance the sharing of insights between different MetaBBO tasks, significantly boosting the efficiency of the reward discovery process. This multi-faceted approach allows READY to automatically design effective reward functions, not only improving the performance of existing MetaBBO algorithms but also providing interpretable design insights.
Experiments demonstrate that READY consistently outperforms existing MetaBBO methods, surpassing both their original reward designs and state-of-the-art baselines like Eureka, EoH, and ReEvo. The discovered reward functions exhibit a surprising degree of generalizability, proving capable of boosting the performance of unseen MetaBBO algorithms, highlighting the potential for broader application. Empirical results show READY consistently improves diverse MetaBBO’s optimization performances, and provides clear and interpretable design insights that co-evolve with the reward program. This work opens new avenues for automated algorithm design and optimisation, potentially revolutionising the development of complex systems where gradient information is unavailable. By removing the reliance on manual reward engineering, READY promises to accelerate progress in MetaBBO and unlock the full potential of various optimisation architectures. The READY project is available at https://anonymous0.4open. science/r/ICML_READY-747F, providing a resource for further research and development in this rapidly evolving field.
LLM-driven reward design for MetaBBO optimisation offers promising
Scientists developed READY, a novel framework leveraging Large Language Models (LLMs) for automated reward discovery in Meta-Black-Box Optimization (MetaBBO). The study addresses the limitations of manually designed reward functions, which introduce bias and hinder scalability in MetaBBO approaches. Researchers engineered a system where an LLM autonomously synthesises reward functions, moving beyond human expertise to improve optimisation performance. This work pioneers the application of LLMs not just for program search, but specifically for designing the reward signals that drive MetaBBO algorithms.
The team implemented a multitask program evolution paradigm, deploying multiple reward program populations across diverse MetaBBO tasks. This parallel process facilitates knowledge sharing and accelerates convergence, enabling the LLM to learn more efficiently. Five reflection-based code-level evolutionary operators were proposed to ensure diverse program search behaviour, promoting exploration of a wider range of potential reward functions. An explicit knowledge transfer scheme was then integrated to further enhance knowledge sharing between different MetaBBO tasks, allowing the LLM to generalise insights and improve reward design.
Experiments employed a benchmark suite of MetaBBO algorithms, evaluating the performance gains achieved with READY-discovered reward functions. The system delivers consistent improvements across diverse MetaBBO methods, surpassing the performance of existing baselines like Eureka, EoH, and ReEvo0.2. Researchers measured performance gains by accumulating meta-objective rewards over T optimisation steps, assessing the effectiveness of the automated reward design. Furthermore, the study revealed that reward functions designed for one MetaBBO method could be directly adapted to boost the performance of unseen methods, demonstrating generalisability.
The approach achieves interpretable design insights, co-evolving with the reward program and providing a clear understanding of the LLM’s decision-making process. This transparency is crucial for validating the rationality of the discovered rewards and building trust in the automated design process. The READY project is available at https://anonymous0.4open. science/r/ICML_READY-747F, providing access to the code and enabling further research in this emerging field.
READY autonomously evolves reward functions for MetaBBO, enabling
Scientists have developed READY, a novel framework leveraging Large Language Models (LLMs) to autonomously design reward functions for Meta-Black-Box Optimization (MetaBBO). This research addresses a critical limitation in the field, where current methods rely heavily on human-designed heuristics, hindering scalability and potentially obscuring true performance capabilities. The team introduced a multitask evolution paradigm, deploying multiple reward program populations across diverse MetaBBO tasks to accelerate the search process and facilitate knowledge sharing. Experiments demonstrate READY consistently improves the optimization performance of various MetaBBO approaches, surpassing existing baselines like Eureka, EoH, and ReEvo.
Results demonstrate READY’s effectiveness through several key achievements. The framework consistently boosted performance across diverse MetaBBO tasks, showcasing an advantage over both original reward designs and state-of-the-art alternatives. Specifically, the team observed a significant improvement in optimization efficiency and generalizability, indicating the LLM-generated rewards effectively guide the BBO optimizers. Furthermore, the study revealed that reward functions discovered by READY for one MetaBBO method could be successfully adapted to enhance the performance of unseen methods, highlighting their broad applicability.
The researchers implemented five reflection-based code-level evolutionary operators to ensure diverse program search behaviour within the LLM. An explicit knowledge transfer scheme was also incorporated to further accelerate convergence by enhancing knowledge sharing between different MetaBBO tasks. Data shows this multitask paradigm is capable of automatically designing desired reward functions, effectively advancing the performance frontier of MetaBBOs. The work provides clear and interpretable design insights that co-evolve with the reward program, offering a deeper understanding of the optimization process.
This breakthrough delivers a fully automated reward discovery framework, a first in the field of LLM-based reward design for MetaBBO. Measurements confirm the superiority of READY’s design capabilities compared to representative baselines, with the framework providing valuable, interpretable insights into reward function design. The READY project is available at https://anonymous0.4open. science/r/ICML_READY-747F, enabling further research and development in this promising area.
READY automates MetaBBO reward function discovery for improved
Scientists have developed READY, a novel framework leveraging large language models (LLMs) to automate reward discovery in Meta-Black-Box Optimization (MetaBBO). Existing MetaBBO systems typically rely on human-designed reward functions, which can introduce bias and limitations; READY addresses this by utilising an LLM to generate these rewards autonomously. The approach incorporates an evolution-based paradigm within the LLM’s program search process, ensuring continuous improvement, and a multi-task evolution architecture to enable parallel reward discovery for various MetaBBO methods, benefiting from knowledge sharing and accelerated convergence. Empirical results demonstrate that the reward functions discovered by READY enhance the performance of existing MetaBBO algorithms, highlighting the critical role of reward design in this field.
The framework exhibits consistent performance across different LLM backbones, including DeepSeek-V3.2, Qwen-3-Max, and Gemini-3-Flash, and suggests a positive correlation between the reasoning capability of the underlying LLM and the effectiveness of the generated rewards. The authors acknowledge that the framework’s performance is dependent on the quality of the foundation models used, and future work could explore methods to further improve the interpretability and robustness of the discovered rewards. This research establishes a significant step towards fully autonomous optimisation pipelines by capturing universal search heuristics. The ability to automate reward design has implications for the development of next-generation self-organised learning agents and could boost MetaBBO performance in real-world applications. While the current study focuses on the technical aspects of reward discovery, the authors suggest that this work could contribute to more adaptable and efficient optimisation systems in the future.
👉 More information
🗞 READY: Reward Discovery for Meta-Black-Box Optimization
🧠 ArXiv: https://arxiv.org/abs/2601.21847
