Scientists are tackling the persistent problem of ‘hallucination’ , the generation of factually incorrect information, within financial Retrieval-Augmented Generation (RAG) systems. Taoye Yin, Haoyuan Hu, and Yaxin Fan from Ant Group, alongside Xinhao Chen, Xinya Wu, and Kai Deng et al., present a novel Reinforcement Learning framework enhanced with Fine-grained Knowledge Verification (RLFKV) to address this critical issue. Their research is significant because it moves beyond simple response evaluation, instead decomposing financial answers into individual knowledge units and verifying each one against source documents. This fine-grained approach delivers more precise feedback to the model, improving factual consistency and preventing the generation of misleading financial information, as demonstrated through experiments on both public and newly created datasets.
This innovative approach dissects financial responses into individual knowledge units and meticulously assesses the correctness of each unit against the retrieved information, providing a more precise signal for model optimisation.
The core of RLFKV lies in its ability to decompose complex financial answers into ‘atomic knowledge units’, representing minimal, self-contained financial facts. Each of these units undergoes rigorous verification against the retrieved documents, generating a fine-grained reward system that directly improves alignment with the source material.
This granular approach moves beyond traditional reinforcement learning methods that rely on costly human annotation and imprecise binary reward signals. Furthermore, to prevent the model from generating overly concise responses as a shortcut to higher rewards, the framework incorporates an ‘informativeness reward’ ensuring the retention of at least as many knowledge units as a baseline model.
Experiments conducted using the public Financial Data Description (FDD) task and a newly created FDD-ANT dataset demonstrate consistent improvements in accuracy and faithfulness. The study confirms the effectiveness of RLFKV in mitigating hallucinations and enhancing the reliability of financial RAG systems.
This advancement is particularly important given the time-sensitive nature of financial queries, where even minor inaccuracies can have significant consequences. The framework’s ability to operate without human-annotated reference answers also offers a substantial reduction in operational costs and scalability challenges.
The research details a system that leverages a financial quadruple structure, entity, metric, value, and timestamp, to precisely capture minimal knowledge units within financial texts. This design specifically addresses the strict temporal sensitivity and quantitative nature of financial data, ensuring a more robust and accurate evaluation process. By employing a specialised prompt to guide an evaluation model, the system effectively decomposes responses and verifies factual consistency, ultimately leading to more trustworthy and informative outputs.
Granular Faithfulness and Informativeness Rewards for Financial Language Model Alignment
A decomposition of financial responses into atomic knowledge units underpins this research into mitigating hallucinations in retrieval-augmented generation systems. The study addresses inaccuracies arising when large language models generate responses contradicting retrieved financial documents, a critical issue given the time-sensitive nature of the domain.
Initially, generated responses are segmented into these minimal, self-contained expressions of financial facts, enabling a granular assessment of factual correctness. Each knowledge unit then undergoes rigorous evaluation against the retrieved documents to determine its consistency and generate a fine-grained faithfulness reward.
This reward system provides precise optimization signals, improving alignment between the generated text and the source material. To counteract potential reward hacking, where the model might produce overly concise replies to maximise reward, an informativeness reward is incorporated. This secondary reward encourages the policy model to retain at least as many knowledge units as a baseline model, ensuring comprehensive responses.
The policy model is then optimised by jointly maximising both the faithfulness and informativeness rewards, guiding it towards generating accurate and informative financial summaries. Experiments were conducted utilising the public Financial Data Description task from BizFinBench, alongside a newly proposed dataset, FDD-ANT.
The evaluation model employed was Qwen3-32B for decomposing responses and verifying knowledge units. This methodology facilitates a more stable training process by moving beyond coarse binary rewards typically derived from human annotation, reducing labelling costs and improving the quality of generated financial data descriptions. The work demonstrates consistent improvements in factual consistency across both datasets, validating the effectiveness of the proposed reinforcement learning framework.
Fine-grained factual consistency assessment via atomic knowledge unit decomposition and reinforcement learning
Decomposition of financial responses into atomic knowledge units and subsequent verification against retrieved documents enables rigorous evaluation of factual consistency. The framework achieves granular optimization signals, improving alignment with retrieved information without requiring annotated reference answers. Each response is broken down into atomic knowledge units, minimal self-contained expressions of financial facts, and these units are then evaluated for support within the retrieved documents.
The evaluation results directly provide fine-grained rewards to guide model optimization, focusing on factual accuracy. To prevent the model from generating overly concise responses as a shortcut to higher rewards, a binary pairwise constraint is incorporated, ensuring the policy model retains at least the same number of knowledge units as the base model.
The study utilizes a financial quadruple structure, entity, metric, value, and timestamp, to precisely capture minimal knowledge units within financial texts, addressing the temporal sensitivity and quantitative nature of the domain. This structure enforces a completeness constraint, invalidating any assertion missing a key element.
The evaluation model decomposes responses using a specialized prompt, explicitly defining the four critical dimensions of financial data. Experiments conducted on the Financial Data Description task from BizFinBench and a newly proposed FDD-ANT dataset demonstrate the effectiveness of the approach. The framework delivers fine-grained rewards for stable optimization, leading to higher-quality generation and eliminating the need for costly human annotation. This method addresses the problem of hallucinations, where models generate responses contradicting the retrieved data, a significant concern in the time-sensitive financial domain.
By decomposing responses into individual knowledge units, the framework assesses the correctness of each unit, providing precise optimisation signals and better alignment with retrieved documents. Furthermore, the research introduces the FDD-ANT dataset, a new resource for evaluating financial data description tasks with diverse data types, and incorporates an informativeness reward to prevent overly concise replies during the reinforcement learning process.
Experiments on both publicly available and newly created datasets demonstrate consistent performance gains, validating the effectiveness of the proposed approach. Error analysis reveals that remaining inaccuracies primarily relate to the handling of relative time expressions, fiscal-to-calendar year conversions, and numerical rounding.
The findings establish a clear path towards more reliable and trustworthy financial language models. Limitations acknowledged by the researchers include ongoing challenges with temporal and numerical accuracy, suggesting areas for refinement. Future work will concentrate on improving the reward mechanisms to specifically address these issues and further enhance the precision of generated responses.
👉 More information
🗞 Mitigating Hallucination in Financial Retrieval-Augmented Generation via Fine-Grained Knowledge Verification
🧠 ArXiv: https://arxiv.org/abs/2602.05723
