Retrieval-augmented generation (RAG) systems represent a powerful approach to improving large language models, offering a way to reduce inaccuracies and incorporate current information, but these systems are vulnerable to a subtle threat: corpus poisoning, where malicious actors introduce misleading documents into the knowledge source. Pankayaraj Pathmanathan, Michael-Andrei Panaitescu-Liess, and Cho-Yu Jason Chiang, all from the University of Maryland College Park, along with Furong Huang, address this challenge with two novel defence mechanisms, RAGPart and RAGMask. These techniques work by strengthening the retrieval stage of RAG pipelines, identifying and mitigating the influence of poisoned documents without altering the underlying language model itself, and represent a computationally efficient path towards more robust and trustworthy RAG applications. The team demonstrates consistent reductions in attack success rates across multiple benchmarks and poisoning strategies, while maintaining performance on legitimate data, and introduces a new method for rigorously testing these defences.
Knowledge, reducing hallucinations and compensating for outdated information. However, recent studies have exposed a critical vulnerability in Retrieval-Augmented Generation (RAG) pipelines, corpus poisoning, where adversaries inject malicious documents into the retrieval corpus to manipulate model outputs. This work proposes two complementary retrieval-stage defences, RAGPart and RAGMask, which are computationally lightweight and require no modification to the generation model.
Retrieval Poisoning Defenses, RAGPart and RAGMask
The paper addresses the vulnerability of Retrieval-Augmented Generation (RAG) systems to retrieval poisoning attacks, where malicious documents are injected into the retrieval database to cause incorrect or harmful outputs. The authors propose two defense mechanisms, RAGPart and RAGMask. RAGPart improves robustness by fragmenting documents, employing multiple retrieval methods, and aggregating results using majority voting or intersection-based aggregation. RAGMask reduces the impact of potentially poisoned content by masking parts of retrieved fragments, controlled by a masking ratio and size. Experiments using the FiQA dataset and several attack methods, including HotFlip and AdvRAGgen, demonstrate that RAGPart significantly reduces the success rate of retrieval poisoning attacks, especially the sophisticated AdvRAGgen attack.
RAGMask further enhances robustness when used with RAGPart. Majority voting proves superior to intersection-based aggregation, balancing security and performance. Simply combining fragments from multiple retrievers without partitioning and masking is ineffective. Detailed analysis reveals that increasing the number of document fragments in RAGPart generally improves defense, though it can reduce utility. Increasing the number of retrievers also enhances defense.
For RAGMask, higher masking ratios increase defense, while larger masking sizes are more effective at reducing the impact of poisoned content. The study confirms that majority voting is the most effective aggregation method, providing a good balance between utility and defense. Retrieval poisoning poses a serious threat to RAG systems, but RAGPart and RAGMask are promising defenses that can significantly improve robustness. Careful hyperparameter tuning is crucial to achieve the best balance between security and performance.
RAGPart and RAGMask Defend Against Corpus Poisoning
Recent advances in retrieval-augmented generation (RAG) systems have improved the accuracy of large language models and reduced inaccurate information, particularly in fields requiring up-to-date knowledge. However, these systems are vulnerable to corpus poisoning attacks, where malicious documents manipulate model outputs. Researchers have developed two complementary defenses, RAGPart and RAGMask, to mitigate these attacks directly at the retrieval stage, without modifying the language model. The team’s work exploits the characteristics of dense retrieval models, which encode semantic meaning into embedding vectors, to identify and neutralize poisoned documents.
RAGPart dilutes the impact of malicious content through document partitioning and mean pooling of fragment embeddings. Experiments show RAGPart can prevent poisoned documents from appearing among the top retrieved results. Complementing RAGPart, RAGMask identifies suspicious tokens within retrieved documents based on similarity shifts when targeted token masking is applied, flagging those causing the most substantial change as potentially malicious. Across benchmarks and poisoning strategies, these defenses consistently reduce attack success rates while maintaining performance under normal conditions. The researchers also introduced an interpretable attack to rigorously stress-test the robustness of their defenses, confirming their effectiveness.
RAG Defenses Against Corpus Poisoning Attacks
Researchers have developed two novel defense mechanisms, RAGPart and RAGMask, to improve the security of Retrieval-Augmented Generation (RAG) systems against corpus poisoning attacks. These attacks involve injecting false information into the knowledge source used by the RAG system to manipulate model outputs. Both RAGPart and RAGMask function directly on the retrieval component of the RAG pipeline, offering a computationally efficient way to mitigate the effects of poisoned documents without altering the underlying language model. The team demonstrates that these defenses consistently reduce the success rate of poisoning attacks while maintaining the system’s ability to provide accurate information.
RAGMask proves particularly effective at preserving the usefulness of the retrieval system, while RAGPart offers a more practical solution when computational resources are limited. The researchers acknowledge that subtly altered facts remain challenging to detect at the retrieval stage, requiring generation-based identification, and highlight this as an area for future research. The study also identifies limitations in existing retrieval-based defenses, demonstrating that the newly proposed methods offer a significant improvement in robustness. Future work will focus on addressing remaining vulnerabilities and exploring methods to detect more sophisticated forms of corpus poisoning, contributing to the development of more trustworthy and reliable RAG systems.
👉 More information
🗞 RAGPart & RAGMask: Retrieval-Stage Defenses Against Corpus Poisoning in Retrieval-Augmented Generation
🧠 ArXiv: https://arxiv.org/abs/2512.24268
