Scientists are tackling the challenge of using artificial intelligence not just to solve problems, but to actively discover genuinely novel solutions that surpass existing benchmarks. Mert Yuksekgonul, Daniel Koceja, and Xinhao Li, alongside Federico Bianchi from Together AI, Jed McCaleb from Astera Institute, and Xiaolong Wang from UC San Diego, present a new approach called Test-Time Training to Discover (TTT-Discover) which uniquely allows a large language model to learn during problem-solving, rather than relying on pre-existing knowledge. This research is significant because it demonstrates state-of-the-art performance across diverse domains , from resolving long-standing mathematical problems like Erdős’ minimum overlap problem, to designing faster GPU kernels, excelling in competitive algorithm design, and even improving single-cell biology analysis , and crucially, achieves these results using an open-source model and a remarkably low computational cost.

Scientists are tackling the challenge of using artificial intelligence not just to solve problems, but to actively discover genuinely novel solutions that surpass existing benchmarks.

LLMs learn and improve during problem solving through

Scientists have achieved a breakthrough in artificial intelligence, demonstrating a novel method for discovering state-of-the-art solutions to complex scientific problems at test time. This innovative approach leverages reinforcement learning, enabling the LLM to refine its strategies based on experience gained during the problem-solving process, rather than simply generating multiple attempts using pre-existing knowledge. The study unveils a unique form of continual learning tailored for discovery problems, where the goal is not generalization to diverse tasks but rather the successful resolution of a single, challenging problem. TTT-Discover distinguishes itself by actively training the LLM on problem-specific data generated during the test phase, creating a focused learning distribution that addresses the unique demands of the task at hand.

Furthermore, TTT-Discover outperformed existing solutions in past AtCoder algorithm competitions and achieved improved denoising performance in single-cell analysis, with solutions validated by experts and competition organizers. Remarkably, all results were obtained using the open-source model, OpenAI gpt-oss-120b, and are reproducible with publicly available code, contrasting with prior achievements that depended on closed, proprietary models. The team’s work opens exciting possibilities for automating the process of scientific innovation, allowing AI to not only assist but actively drive breakthroughs in various fields. Figure 1 illustrates how TTT-Discover progressively refines its solutions during the GPUMode TriMul competition, demonstrating a clear improvement in reward distribution as the LLM learns from its attempts, ultimately exceeding the performance of previously established human benchmarks and best-of-N sampling strategies. This achievement highlights the potential of combining the power of LLMs with reinforcement learning to tackle previously intractable scientific challenges and accelerate the pace of discovery.

Reinforcement Learning Refines LLM Problem Solving through iterative

This approach diverges from prior test-time scaling techniques like AlphaEvolve, which rely on prompting a frozen LLM, by enabling the LLM to continually learn from experience specific to the current problem. To validate findings, solutions were subjected to expert review or assessed by competition organizers, ensuring rigorous evaluation of performance gains. The core innovation lies in the implementation of reinforcement learning within a single-problem environment, defining the test problem as the sole source of reward. Scientists harnessed this setup to create a focused learning loop, where the LLM iteratively improves its solutions based on feedback from the problem itself.

For the GPUMode TriMul competition, the team tracked reward distributions at test-time training steps 0, 9, 24, and 49, generating 512 solutions at each step to visualize the LLM’s progress. This detailed analysis, presented in Figure 1, reveals that the LLM consistently generates improved solutions, ultimately surpassing prior state-of-the-art results achieved by human experts. The team’s approach not only sets new state-of-the-art results but also offers a reproducible methodology using an open-source model, contrasting with previous achievements reliant on closed frontier models.

TTT-Discover surpasses benchmarks via test-time learning, achieving state-of-the-art

The research team successfully implemented reinforcement learning at test time, allowing the language model, OpenAI gpt-oss-120b, to continually train and refine its solutions specifically for each problem encountered. This differs from previous approaches that relied on frozen language models, enabling the AI to internalize knowledge and generate superior outcomes. Furthermore, tests on the GPUMode kernel competition demonstrated significant speed improvements, with the TTT-Discover solution achieving a runtime of 1161μs on the H100 GPU, outperforming the best human result of 1371μs and a previous best of 5352μs. Data shows a substantial performance gain, with the TTT-Discover solution being up to 2× faster than prior art.

The team measured success in algorithm design by excelling in past AtCoder competitions, achieving a score of 567,062, exceeding the previous AI best of 558,026. Results also demonstrate a significant advancement in single-cell analysis, where TTT-Discover achieved a denoising score of 0.71, surpassing the best human score of 0.64. Analysis of the GPUMode TriMul competition revealed that as training progressed, the language model generated increasingly effective solutions, as evidenced by the reward distribution at test-time training steps 0, 9, 24, and 49. The final reward distribution at step 49 demonstrably surpassed the performance of the best-of-N sampling approach with the same computational budget.

TTT-Discover refines LLMs for specific scientific problems, enhancing

This approach allows the LLM to continually learn from problem-specific experience, aiming for a single, high-quality solution rather than generalisation across multiple tasks. The authors acknowledge a limitation in the current implementation, specifically the focus on problems with continuous rewards, which may restrict its direct applicability to problems with discrete reward structures. Future research directions could explore extending TTT-Discover to handle discrete rewards and investigating its performance on a wider range of scientific challenges. Despite this limitation, the findings represent a significant advancement in applying AI to scientific discovery, demonstrating a practical and effective method for leveraging LLMs to achieve state-of-the-art performance on complex problems, and offering a pathway towards automated scientific problem-solving.

👉 More information
🗞 Learning to Discover at Test Time
🧠 ArXiv: https://arxiv.org/abs/2601.16175

Tags:

AtCoder algorithm competitions Continual Learning Erdős’ minimum overlap problem! GPUMode kernel OpenAI gpt-oss-120b Reinforcement Learning Single-cell Analysis Test-Time Training to Discover

AI Achieves State-Of-The-Art Scientific Discovery with Test-Time Training to Discover

LLMs learn and improve during problem solving through

Reinforcement Learning Refines LLM Problem Solving through iterative

TTT-Discover surpasses benchmarks via test-time learning, achieving state-of-the-art

TTT-Discover refines LLMs for specific scientific problems, enhancing

Rohail T.

Latest Posts by Rohail T.:

Quantum Error Correction Gains a Clearer Building Mechanism for Robust Codes

Protected: Models Achieve Reliable Accuracy and Exploit Atomic Interactions Efficiently

Protected: Quantum Computing Tackles Fluid Dynamics with a New, Flexible Algorithm