Thetaevolve Simplifies Test-time Learning, Extending AlphaEvolve with a Single LLM to Continually Improve Open Optimization Problems

The pursuit of new mathematical discoveries is receiving a boost from artificial intelligence, as researchers demonstrate a system capable of evolving programs to improve solutions to longstanding open problems. Yiping Wang, Shao-Rong Su, and Zhiyuan Zeng, alongside Eva Xu, Liliang Ren, and Xinyu Yang, present ThetaEvolve, an open-source framework that builds upon the success of earlier closed-source systems like AlphaEvolve. ThetaEvolve distinguishes itself by simplifying the process of test-time learning and scaling both in-context learning and reinforcement learning, allowing the system to continually refine its strategies and achieve new best-known results on challenging problems such as circle packing and the first auto-correlation inequality. This achievement is particularly significant because ThetaEvolve enables a relatively small, open-source language model to surpass previous performance, demonstrating that powerful mathematical discovery does not require massive computational resources or proprietary technology.

Evolving Programs via Reinforcement Learning and Archives

This research introduces ThetaEvolve, a novel approach to program creation and optimisation that combines reinforcement learning with a sophisticated program database. The system aims to enhance program performance on complex tasks by intelligently exploring a diverse range of potential solutions and efficiently reusing successful components. ThetaEvolve leverages reinforcement learning to train an agent that generates and refines programs, guided by a program database managed using MAP-Elites, a technique that maintains a diverse archive of programs, not just those with the highest scores, but also those excelling in different areas. This prevents the system from prematurely settling on suboptimal solutions.

To further enhance exploration, ThetaEvolve employs an island-based evolutionary strategy, where multiple sub-populations of programs evolve independently, preventing the entire population from becoming stuck in local optima and promoting diversity. Programs are periodically exchanged between these islands, fostering collaboration and preventing stagnation. The combination of MAP-Elites and the island model creates a robust program database that allows for efficient exploration, reuse, and refinement of programs. Ablation studies confirmed the importance of both the MAP-Elites algorithm and the island-based model for effective program database management.

Results demonstrate that ThetaEvolve significantly improves performance on challenging tasks, including circle packing, auto-correlation, and Hadamard matrix construction. The MAP-Elites and island-based strategy contribute to a more diverse and robust population of programs, enabling the discovery of better solutions. Visualisations comparing ThetaEvolve’s solutions with those from other methods reveal unique characteristics in the generated programs, such as differences in symmetry observed in circle packing solutions. Simplifying the program database led to a noticeable decrease in performance, further confirming their importance.

Evolving Programs Discover Improved Mathematical Bounds

ThetaEvolve represents a significant breakthrough in the application of large language models to mathematical discovery, delivering a new open-source framework capable of achieving state-of-the-art results on challenging open problems. The research team successfully implemented ThetaEvolve, a system that evolves programs to improve bounds on unsolved mathematical questions, and demonstrated its ability to surpass existing methods while utilising a comparatively small open-source language model. Experiments on the circle packing and first auto-correlation inequality problems reveal that ThetaEvolve consistently outperforms inference-only baselines, demonstrating the system’s capacity to learn evolving capabilities at test time. Notably, the circle-packing program discovered by ThetaEvolve consistently finds the best-known solution in just 3 seconds, a substantial improvement over the time required by a comparable program from another system.

The team achieved these results by streamlining the process to utilise a single language model for increased efficiency. ThetaEvolve incorporates a large program database and employs batch sampling techniques, significantly scaling test-time compute and improving final performance on both trained target tasks and unseen problems. Furthermore, the integration of reinforcement learning into the framework allows the system to learn from its experiences, demonstrating faster progress and better final performance compared to purely inference-based approaches. The research team’s work demonstrates that effective exploration strategies and “search-on-the-edge” behaviours can be learned by the model itself, opening new avenues for applying language models to complex scientific challenges. These advancements position ThetaEvolve as a powerful tool for mathematical discovery and a significant step forward in the field of artificial intelligence.

ThetaEvolve Discovers Improved Mathematical Bounds

ThetaEvolve represents a significant advance in the field of automated mathematical discovery, demonstrating the ability of a single, relatively small open-source language model to achieve new state-of-the-art results on challenging open problems. Researchers developed this framework to efficiently scale both in-context learning and reinforcement learning, allowing the system to continually improve its performance on optimisation tasks. Notably, ThetaEvolve successfully discovered improved bounds for the circle packing and first auto-correlation inequality problems, previously achieved only by much larger, closed-source systems. The team’s innovation lies in several key features, including a program database for enhanced exploration, a batch sampling method for increased throughput, and techniques to encourage diverse and effective program evolution. Through rigorous testing, they demonstrated that ThetaEvolve, when utilising reinforcement learning at test time, consistently outperforms systems relying solely on inference, indicating a genuine capacity for learning evolving strategies. Furthermore, the learned capabilities generalise to unseen tasks, suggesting the framework’s broader applicability.

👉 More information
🗞 ThetaEvolve: Test-time Learning on Open Problems
🧠 ArXiv: https://arxiv.org/abs/2511.23473

Rohail T.

Rohail T.

As a quantum scientist exploring the frontiers of physics and technology. My work focuses on uncovering how quantum mechanics, computing, and emerging technologies are transforming our understanding of reality. I share research-driven insights that make complex ideas in quantum science clear, engaging, and relevant to the modern world.

Latest Posts by Rohail T.:

Unifying Dicke Framework Resolves Discrepancies in Emission, Absorption, and Transfer Collective Effects

Unifying Dicke Framework Resolves Discrepancies in Emission, Absorption, and Transfer Collective Effects

December 2, 2025
Edge Deployment of Small Language Models Demonstrates Performance and Efficiency on CPU, GPU, and NPU Backends

Edge Deployment of Small Language Models Demonstrates Performance and Efficiency on CPU, GPU, and NPU Backends

December 2, 2025
Metasurface-enhanced Mid-Infrared Imaging Spectroscopy Enables High-Throughput Molecular Detection

Metasurface-enhanced Mid-Infrared Imaging Spectroscopy Enables High-Throughput Molecular Detection

December 2, 2025