Scientists are tackling the persistent problem of ‘hallucinations’ in large language models, where generated text contradicts source information. Yifan Zhu, Huiqiang Rong, and Haoran Luo from Beijing University of Posts and Telecommunications and Nanyang Technological University respectively, introduce Token-Guard, a novel decoding method designed to control these inaccuracies at the level of individual tokens. This research is significant because it offers a scalable and modular solution for improving LLM reliability, moving beyond resource-intensive techniques like Retrieval-Augmented Generation and Reinforcement Learning with Human Feedback by implementing internal self-checking and iterative error correction during the decoding process. Token-Guard demonstrably reduces hallucinations and enhances accuracy on challenging HALU datasets, representing a step towards more trustworthy artificial intelligence.
Token-Guard demonstrably reduces hallucinations and enhances accuracy on challenging HALU datasets, representing a step towards more trustworthy artificial intelligence.
Token-Guard proactively controls LLM hallucination at token level
The research addresses a critical limitation of current LLMs, which frequently ‘hallucinate’, producing outputs inconsistent with provided inputs. The team achieved this breakthrough by focusing on internal verification within the decoding process itself. Iterative pruning and regeneration dynamically correct detected errors, ensuring a more consistent and factually sound output. This method differs from existing decoding techniques, such as Auto-Regressive Chain-of-Thought, Tree-of-Thought, Guided Decoding, and Predictive Decoding, which often lack explicit token-level hallucination checking and robust dynamic correction capabilities.
Experiments conducted on the HALU datasets demonstrate Token-Guard’s substantial improvements in both hallucination reduction and generation accuracy. Specifically, the research reveals relative improvements of up to 16.3% over the strongest baseline models, alongside enhancements in logical and factual consistency. Performance across benchmarks including FinanceBench, Halueval showed average F1 scores ranging from 8.8% to 78.5% depending on the dataset. This work introduces three key innovations: token-level hallucination control through scoring and pruning, segment-level explicit hallucination scoring for reliable fragment selection, and a local enhancement with global iteration strategy for dynamic correction. By combining these elements, Token-Guard not only identifies potential errors but also actively corrects them, maintaining logical consistency and reducing computational costs. The researchers have made their code publicly available, paving the way for wider adoption and further development of this promising technique in the field of natural language processing.
Token-Guard self-checking decoding for LLM reliability improves output
Experiments employed the HALU datasets to rigorously evaluate Token-Guard’s performance. The study pioneered a multi-path generation strategy, expanding multiple reasoning paths in parallel to increase the probability of achieving a correct solution. Iterative pruning and regeneration were then implemented to dynamically correct detected errors, refining the generated text and improving its factual consistency. The team engineered a system that scores candidate tokens in the latent space, pruning those with low confidence to suppress hallucinated content and improve precision. Specifically, relevant tokens were grouped into segments and subjected to explicit hallucination scoring, allowing for a nuanced assessment of factual accuracy.
This segment-level evaluation contrasts with traditional probability scoring methods, which fail to explicitly quantify hallucination risk. The technique reveals a dynamic correction capability, supporting iterative refinement and resource allocation to enhance output consistency and reduce computational costs. Performance comparisons across multiple benchmarks, Halueval, demonstrate Token-Guard’s effectiveness. The method achieved F1 scores of 16.0, 14.7, 32.3, 44.2, 39.1, 9.6, 42.2, 28.3, 16.4, 22.2, 40.4, 56.0, 36.7, 13.9, 57.4, 34.7, 8.8, 14.5, 39.4, 29.5, 29.2, 11.7, 38.0, 24.4, 11.0, 20.5, 36.8, 49.3, 49.2, 20.3, 55.3, 34.6, 14.4, 21.3, 43.7, 47.7, 37.7, 12.4, 56.0, 33.3, 30.8, 43.9, 47.6, 68.5, 58.1, 29.7, 78.5, 51.0, consistently outperforming BaseModel, Guided Decoding, Predictive Decoding, and Auto-Regressive Chain-of-Thought (CoT) methods. The publicly available code facilitates further research and implementation of this innovative approach.
Token-Guard reduces LLM hallucinations via iterative refinement
The team measured a relative improvement of up to 16.3% in generation accuracy when compared to the strongest baseline decoding methods combined with large models. This breakthrough delivers significant gains in factual and logical consistency, as evidenced by detailed analysis of generated outputs. Specifically, the study quantifies the reduction in hallucinated tokens at each reasoning step, demonstrating the effectiveness of the self-checking mechanism. Token-Guard introduces three key innovations to address limitations in existing decoding methods. Firstly, token-level hallucination control is achieved by scoring candidate tokens in the latent space and pruning those with low confidence, enhancing factual precision.
Secondly, segment-level explicit hallucination scoring groups relevant tokens into fragments and assigns risk scores, guiding the selection of more reliable and accurate content. Finally, local enhancement and global iteration strategies dynamically correct prior fragments during multi-fragment reasoning, maintaining logical consistency without incurring excessive computational costs. Measurements confirm that the system effectively filters unreliable tokens during generation, improving the quality of LLM outputs. The research details how the hallucination risk score is calculated and used to guide token selection, resulting in more consistent and accurate responses. Tests prove that the iterative correction mechanism effectively addresses errors without significantly increasing processing time. The work establishes a new benchmark for hallucination control in LLMs, paving the way for more trustworthy and dependable AI systems.
Token-Guard mitigates LLM hallucinations via iterative refinement
Token-Guard employs a three-stage filtering pipeline, evaluating candidate text fragments in a latent space to assess hallucination risk, and dynamically correcting errors through iterative pruning and regeneration. The framework is designed to be scalable and modular, offering a portable solution that functions effectively across different model sizes without requiring additional fine-tuning. However, the authors acknowledge that performance can be reduced in extremely large-context settings, and on the RAGTruth dataset with a smaller 3B parameter model. The research establishes a reliable approach to mitigating hallucinations, a common problem in LLMs, by embedding self-checking mechanisms into the decoding process. Token-Guard’s ability to maintain efficient resource usage and portability across model scales suggests its potential for widespread application in improving the trustworthiness of generated text. Future work could explore extending the method to handle even larger contexts and integrating it with other hallucination reduction techniques, potentially leading to even more robust and accurate LLM outputs.
👉 More information
🗞 Token-Guard: Towards Token-Level Hallucination Control via Self-Checking Decoding
🧠 ArXiv: https://arxiv.org/abs/2601.21969
