The pursuit of improved reasoning in large language models remains a central challenge, particularly when computational resources are limited. Tianyu Fu, Yichen You, Zekai Chen, Guohao Dai, Huazhong Yang, and Yu Wang address this issue with a novel approach that focuses computational effort where it matters most. Their work identifies a counterintuitive problem, that repeatedly processing already-correct predictions can actually introduce errors, and introduces Think-at-Hard (TaH), a dynamic method that selectively revisits only the most challenging parts of a text. By intelligently allocating computational resources, TaH significantly boosts reasoning performance across multiple benchmarks, achieving substantial accuracy gains while maintaining the same overall model size, and represents a key step towards more efficient and reliable language models.

Thought Amplification Boosts Reasoning in LLMs

This research introduces Think-at-Hard (TaH), a novel method for enhancing the reasoning capabilities of large language models without increasing their parameter count. The team addressed a key challenge in improving model performance, particularly on complex tasks requiring multi-step reasoning, by selectively refining only the most challenging parts of a text. TaH identifies tokens likely to be incorrect after an initial processing pass and focuses computational effort on these specific areas, rather than applying the same scrutiny to every part of the input. This dynamic approach significantly improves performance while maintaining computational efficiency.

The method employs a lightweight neural decider to determine which tokens require further attention, allowing approximately 94% of tokens to be processed only once and minimizing unnecessary computation. During focused refinement, Low-Rank Adaptation (LoRA) modules adjust the model’s objective, shifting from general next-token prediction to targeted refinement of the difficult tokens. A duo-causal attention mechanism extends the standard attention process, facilitating cross-iteration information flow without sacrificing computational efficiency. Experiments across five challenging benchmarks, including GSM8K, MATH500, AMC23, AIME25, and Olympiad-Bench, demonstrate that TaH consistently outperforms existing approaches.

Compared to strong baseline models, TaH achieves accuracy gains of 4. 0-5. 4%, and improvements of 8. 1-12. 6% against other reasoning methods.

Notably, an enhanced variant, TaH+, which adds less than 3% additional parameters, further increased these gains to 5. 3% and 5. 4% on certain models. Researchers observed limited effectiveness from fixed-depth recurrent transformers and query routing methods, highlighting the benefits of TaH’s dynamic approach. Training dynamics reveal that TaH converges faster than standard language models, achieving lower perplexity on validation datasets. The neural decider successfully mimics an oracle strategy, accurately predicting which tokens require further iteration. These results confirm the effectiveness of the method and its ability to significantly enhance language model reasoning capabilities.

👉 More information
🗞 Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models
🧠 ArXiv: https://arxiv.org/abs/2511.08577

Tags:

duo-causal attention hard tokens iteration depth Large Language Models latent thinking low-rank adaptation Next-token prediction recurrent transformers

Think-at-hard Selectively Iterates on 5.4% of Tokens, Improving Reasoning in Language Models with 12.6% Gains

Thought Amplification Boosts Reasoning in LLMs

Rohail T.

Latest Posts by Rohail T.:

Asymmetry Boosts Quantum Walker Delocalization and Position

Robots Learn to Grasp Objects Using Vision and Simulation

Deletion Requests Allow Reconstruction of 90% of Data