The capacity for artificial intelligence to exhibit robust reasoning skills remains a central challenge in the development of truly generalisable machine intelligence. Recent progress utilising large language models (LLMs) and techniques like Long Chain-of-Thought reasoning demonstrates promising cross-domain performance, yet the underlying principles enabling this transfer of knowledge are not fully elucidated. Researchers now propose that this generalisation stems from the identification and application of shared abstract reasoning prototypes, fundamental patterns that underpin problem-solving across diverse fields. This work, detailed in the article ‘ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning in LLMs’, is the result of collaborative effort from Feng He, Zijun Chen, Xinnian Liang, Tingting Ma, Yunqi Qiu, Shuangzhi Wu, and Junchi Yan, representing both ByteDance Seed and Shanghai Jiao Tong University. Their framework, ProtoReasoning, leverages formal systems such as Prolog—a logic programming language—and PDDL—a planning domain definition language—to construct, verify, and scale these reasoning prototypes, demonstrably improving performance across a range of benchmarks including logical reasoning, planning, general reasoning and mathematical problem solving.
Recent advances in large reasoning models (LRMs), trained with long chain-of-thought (Long CoT) reasoning, demonstrate notable cross-domain generalisation capabilities, prompting investigation into the mechanisms driving this transfer. Research suggests that shared abstract reasoning prototypes – fundamental patterns capturing the essence of a problem across different domains – facilitate this generalisation by minimising representational differences and revealing shared reasoning structures beneath seemingly diverse tasks. The introduction of ProtoReasoning, a framework designed to enhance the reasoning ability of large language models (LLMs) through scalable and verifiable prototypical representations, offers a promising pathway towards developing more robust and adaptable artificial intelligence systems.
ProtoReasoning demonstrably improves the reasoning capabilities of LLMs by implementing scalable and verifiable prototypical representations, specifically utilising Prolog for logical reasoning and Planning Domain Definition Language (PDDL) for planning tasks. The framework operates on the premise that cross-domain generalisation in reasoning arises from these shared abstract reasoning prototypes, minimising the impact of superficial differences between problems. The system features an automated pipeline that transforms problems into their corresponding prototype representations, enabling efficient processing and analysis. A comprehensive verification system, leveraging Prolog and PDDL interpreters, provides reliable feedback, ensuring the correctness and validity of the reasoning process. Crucially, the framework exhibits scalability, allowing the synthesis of problems within the prototype space while maintaining guaranteed correctness, a significant improvement over existing methods. Experimental results confirm the efficacy of ProtoReasoning, demonstrating a 4.7% improvement in logical reasoning performance on the Enigmata-Eval benchmark, a 6.3% improvement on planning tasks, a 4.0% improvement on general reasoning as measured by the MMLU benchmark, and a 1.0% improvement on mathematical problem-solving using the AIME24 dataset.
ProtoReasoning employs Prolog, a logic programming language, for logical reasoning and PDDL for planning tasks. Prolog operates on declarative statements defining facts and rules, allowing the system to deduce new information through logical inference. PDDL, conversely, defines actions in terms of preconditions – the conditions that must be true before an action can be taken – and effects, the changes that occur after an action is executed, mirroring human planning processes. This declarative approach allows for a formal representation of problems, facilitating verification and analysis.
Ablation studies corroborate the central hypothesis, confirming that learning within the prototype space fosters enhanced generalisation to structurally similar problems when compared to training solely on natural representations. This finding validates the proposition that reasoning prototypes serve as a foundational element for generalisable reasoning, providing a robust and transferable basis for problem-solving. The framework therefore offers a promising avenue for developing more robust and adaptable AI systems capable of tackling complex challenges across multiple domains, and positions formal reasoning systems, like Prolog and PDDL, not just as tools for artificial intelligence, but as crucial components in understanding and replicating the underlying mechanisms of human cognition.
👉 More information
🗞 ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning in LLMs
🧠 DOI: https://doi.org/10.48550/arXiv.2506.15211
