The complexities of financial reasoning, such as calculating taxes, present a significant challenge for many, with Americans alone spending an average of thirteen hours on this task each year. William Jurayj from Johns Hopkins University, Nils Holzenberger from Télém Paris, Institut Polytechnique de Paris, and Benjamin Van Durme, along with their colleagues, address this problem by developing a new approach that combines the power of large language models with the precision of symbolic solvers. This innovative system calculates financial obligations with greater accuracy and auditability than current methods, offering a potential solution to reduce costly errors and improve equitable access to reliable financial assistance. The team’s research not only demonstrates improved performance on a challenging benchmark dataset, but also introduces a method for estimating the economic benefits of such a system, showing that it could significantly reduce costs compared to current real-world averages.
This research proposes an approach integrating LLMs with a symbolic solver to accurately calculate tax obligations. The system’s performance is evaluated using the challenging Statutory Reasoning Assessment (SARA) dataset, and a novel method estimates the cost of deployment based on real-world penalties for tax errors. Furthermore, the research demonstrates how combining up-front translation of plain-text rules into formal logic programs with intelligently retrieved exemplars for formal case representations can dramatically improve performance.
LLMs Evaluate Tax Law Reasoning Faithfully
This research explores how large language models can be effectively applied to the complex domain of legal reasoning, specifically focusing on tax law. While LLMs possess potential, achieving reliable and trustworthy results requires careful consideration of several key factors. Simply generating text that sounds legally sound is insufficient; the models must demonstrate genuine reasoning ability and provide faithful explanations for their conclusions. The way legal knowledge is represented to the LLM is crucial for success. Approaches include directly training on legal text, such as tax code and case law, structuring information within knowledge graphs, or representing legal rules as formal logic programs using a technique called Prolog.
Improving performance during the actual use of the model is also vital, and researchers are investigating techniques like self-consistency, where multiple responses are generated and the most consistent one is selected. Retrieval-augmented generation, which involves fetching relevant information from a knowledge base, and reranking potential answers based on their likelihood of being correct, are also proving effective. Current LLMs often struggle with hallucinations and complex reasoning tasks. Ensuring the LLM’s reasoning process is transparent and justifiable is paramount; simply getting the right answer isn’t enough.
Understanding why the model arrived at a particular conclusion is essential. The way a question is phrased, known as prompt engineering, significantly impacts the LLM’s response, making careful prompt design crucial. LLMs can often perform well on new tasks with only a few examples, a technique called few-shot learning, which reduces the need for large amounts of labeled training data. Representing legal rules using Prolog offers several advantages, including formal semantics, clear and unambiguous meaning, and built-in reasoning capabilities. Prolog’s inference engine can automatically derive conclusions from the rules, and it can provide a trace of the reasoning process, making it easier to understand how the model arrived at its conclusion.
Knowledge graphs, which represent legal information as a network of entities and relationships, can also improve the LLM’s ability to understand and reason about complex legal concepts. Catala is a programming language specifically designed for the law, aiming to provide a more structured and reliable way to represent legal rules. Generating multiple responses and selecting the most consistent one can significantly improve accuracy. Fetching relevant information from a knowledge base and providing it to the LLM can help it generate more accurate and informed responses. Using a separate model to reorder potential answers based on their likelihood of being correct can improve the quality of the results.
Prompt retrieval, finding the most relevant prompts from a database, and process supervision, guiding the LLM’s reasoning process with intermediate steps, are also proving beneficial. The research suggests that LLMs have the potential to revolutionize legal reasoning, but significant challenges remain. Key areas for future research include improving faithfulness and explainability, developing more robust knowledge representation techniques, optimizing LLM performance and efficiency, addressing bias and fairness, and developing tools for legal professionals. Ultimately, the goal is to create AI systems that can assist legal professionals and improve access to justice.
Neuro-Symbolic Reasoning Improves Tax Assistance Accuracy
Researchers have developed a novel neuro-symbolic system that significantly improves the accuracy and cost-effectiveness of automated tax assistance. The team addresses the limitations of large language models in complex reasoning tasks, such as tax filing, by integrating them with a symbolic solver capable of formal logic calculations. Experiments demonstrate that this approach offers a promising pathway to increasing equitable access to reliable tax expertise. The system’s performance was evaluated using the challenging Statutory Reasoning Assessment (SARA) dataset, revealing that models incorporating reasoning capabilities outperform those without, both in solving tax problems and in parsing text for the solver.
Crucially, the team discovered that adding refusal criteria through the symbolic solver and self-checking mechanisms dramatically reduces the expected costs of deployment. The system’s projected costs could be less than 20% of the average amount an American spends filing their taxes, representing a substantial economic benefit. This cost reduction stems from the system’s ability to identify situations where it lacks sufficient certainty and abstain from providing potentially inaccurate guidance. The findings indicate that neuro-symbolic architectures offer a viable solution for expanding access to trustworthy and reliable tax expertise, particularly for communities disproportionately affected by tax inaccuracies.
Neuro-Symbolic AI Improves Tax Calculation Accuracy
The research demonstrates a novel approach to automated tax assistance by integrating large language models with symbolic solvers. This combination enhances the capability of AI systems to accurately calculate tax obligations, addressing a significant challenge due to the complexity of tax rules and the potential for costly errors. The team’s system significantly improves performance on challenging tax assessment tasks and, importantly, reduces overall expenses while also improving the auditability of calculations. The findings highlight the potential for neuro-symbolic architectures to broaden equitable access to reliable tax guidance.
While acknowledging trade-offs between the initial cost of translating tax rules into formal logic and ongoing computational expenses, the research suggests that leveraging symbolic reasoning offers a viable path toward cost-effective and trustworthy AI tax assistance. Future work will focus on improving the efficiency of translating statutory rules and exploring the benefits of scaling models or using specialized smaller models. Ultimately, the authors envision real-world implementation facilitated by interactive user studies to ensure widespread accessibility and trust in the system.
👉 More information
🗞 Enabling Equitable Access to Trustworthy Financial Reasoning
🧠 ArXiv: https://arxiv.org/abs/2508.21051
