Ax-prover: Deep Reasoning Agentic Framework Solves Theorems in Mathematics and Quantum Physics with Formal Proofs

Automated theorem proving represents a significant challenge at the intersection of artificial intelligence and formal logic, and researchers are now pushing the boundaries of what machines can achieve in this field. Marco Del Tredici, Jacob McCarran, and Benjamin Breen, all from Axiomatic AI, alongside Javier Aspuru Mijares, Weichen Winston Yin, and Jacob M. Taylor, present Ax-Prover, a novel multi-agent system designed to tackle complex problems in mathematics and quantum physics. This framework combines the reasoning capabilities of large language models with the formal rigor of Lean, a powerful theorem proving tool, allowing it to both autonomously solve problems and collaborate with human experts. The team demonstrates that Ax-Prover achieves competitive results on established benchmarks and surpasses existing systems on newly introduced challenges in abstract algebra and quantum theory, suggesting a generalizable approach to formal verification across diverse scientific domains and offering a powerful new tool for mathematicians and physicists alike.

LLMs and Formal Theorem Proving Progress

This document comprehensively surveys recent research, primarily from 2019 to 2025, in automated theorem proving and formal verification, with a strong emphasis on the intersection of large language models and formal systems like Lean. The field explores how these technologies can rigorously prove mathematical theorems and verify software correctness, leveraging established tools and extending their capabilities. Current work focuses on using large language models, such as GPT-4, Gemini, Minerva, Numinamath, and Goedel-Prover, to assist in the theorem proving process, including predicting proof steps, solving problems, translating statements into Lean code, and breaking down complex problems into subgoals. Researchers also employ reinforcement learning and create datasets, like Numinamath, PutnamBench, and MiniF2F, to evaluate system performance. In essence, this research paints a picture of a rapidly evolving field where large language models are becoming increasingly important tools for mathematical reasoning and formal verification, combining the strengths of natural language understanding and pattern recognition with the rigor and correctness of formal systems to create more powerful and reliable systems.

LLM-Lean Integration for Automated Theorem Proving

Scientists engineered Ax-Prover, a multi-agent system for automated theorem proving within the Lean formal system, to overcome limitations in both specialized and general-purpose AI approaches. This innovative system pioneers a workflow that equips large language models with Lean tools via the Model Context Protocol, enabling formal verification and rigorous theorem proving, combining reasoning capabilities with formal correctness and allowing for both autonomous operation and collaborative interaction with human experts. The large language model analyzes unproven theorems, proposes proof strategies, and generates Lean code, while Lean tools facilitate inspection, search for relevant results, error detection, and proof verification. To rigorously evaluate Ax-Prover’s performance, scientists benchmarked the system against state-of-the-art provers and frontier large language models using established datasets, NuminaMath-LEAN and PutnamBench, and introduced two new datasets, AbstractAlgebra and QuantumTheorems, to assess capabilities in advanced domains like algebraic structures and quantum theory. This study highlights that Ax-Prover avoids domain overspecialization and can operate effectively with any recent version of the Mathlib library without retraining, preserving tool-use and conversational abilities, and enabling interactive collaboration with human mathematicians, as demonstrated by its successful application in formalizing a complex cryptography theorem.

Ax-Prover Outperforms Lean Theorem Provers

Scientists developed Ax-Prover, a new system for automated theorem proving in Lean, which combines the reasoning abilities of large language models with the formal verification capabilities of Lean tools, addressing a gap between specialized provers and general-purpose language models lacking formal reasoning infrastructure. Ax-Prover utilizes the Model Context Protocol to equip language models with Lean tools, enabling them to analyze theorems, propose proof strategies, and generate Lean code for verification. Experiments demonstrate Ax-Prover achieves competitive performance on the PutnamBench dataset and outperforms both general-purpose language models and state-of-the-art specialized provers on newly introduced benchmarks, AbstractAlgebra and QuantumTheorems, focusing on algebraic structures and quantum mechanics. These results highlight Ax-Prover’s potential as a key AI verification tool for mathematically grounded scientific reasoning, and the team demonstrated its assistant capabilities by collaborating with a mathematician to formally verify a complex cryptography theorem.

Novel Theorem Proving Across Scientific Domains

Ax-Prover represents a significant advance in automated theorem proving, demonstrating a novel multi-agent system capable of solving problems across diverse scientific domains, including abstract algebra and quantum theory. The system combines the broad reasoning capabilities of large language models with the formal rigor of Lean’s proof environment, achieving competitive results on established benchmarks like PutnamBench and NuminaMath-LEAN, and outperforming existing methods on newly introduced datasets focused on research-level mathematics and physics. This achievement addresses key limitations of current automated provers, namely their restricted applicability beyond mathematics, difficulty collaborating with human experts, and high maintenance costs. Evaluations demonstrate Ax-Prover’s potential as a deep formal reasoning assistant, capable of both autonomous problem solving and collaborative work with researchers, as evidenced by a case study in cryptography where it facilitated the formalisation of a complex theorem. Future work will focus on enhancing the system through parallelized agents and the integration of a long-term memory module, which will further expand its capabilities in extended, collaborative problem solving.

👉 More information
🗞 Ax-Prover: A Deep Reasoning Agentic Framework for Theorem Proving in Mathematics and Quantum Physics
🧠 ArXiv: https://arxiv.org/abs/2510.12787

Rohail T.

Rohail T.

As a quantum scientist exploring the frontiers of physics and technology. My work focuses on uncovering how quantum mechanics, computing, and emerging technologies are transforming our understanding of reality. I share research-driven insights that make complex ideas in quantum science clear, engaging, and relevant to the modern world.

Latest Posts by Rohail T.:

Topology-aware Machine Learning Enables Better Graph Classification with 0.4 Gain

Llms Enable Strategic Computation Allocation with ROI-Reasoning for Tasks under Strict Global Constraints

January 10, 2026
Lightweight Test-Time Adaptation Advances Long-Term EMG Gesture Control in Wearable Devices

Lightweight Test-Time Adaptation Advances Long-Term EMG Gesture Control in Wearable Devices

January 10, 2026
Deep Learning Control AcDeep Learning Control Achieves Safe, Reliable Robotization for Heavy-Duty Machineryhieves Safe, Reliable Robotization for Heavy-Duty Machinery

Generalist Robots Validated with Situation Calculus and STL Falsification for Diverse Operations

January 10, 2026