Rag Security and Privacy: Formalizing Threat Models and Attack Surfaces in Retrieval-Augmented Generation

Retrieval-Augmented Generation, a promising technique that blends large language models with external knowledge sources, is rapidly gaining traction in natural language processing. However, this approach introduces novel privacy and security vulnerabilities that demand careful consideration, and Atousa Arzanipour, Rouzbeh Behnia, Reza Ebrahimi, and Kaushik Dutta from the University of South Florida investigate these risks with a new formal framework. Their work addresses a critical gap in the field by establishing the first formal threat model specifically designed for retrieval-augmented generation systems, moving beyond existing analyses of standalone language models. The team defines a structured taxonomy of potential adversaries and formally characterises key threat vectors, including the risk of revealing sensitive information about retrieved documents or manipulating system behaviour through malicious content. By providing these formal definitions, the research establishes a foundation for more robust and principled approaches to privacy and security in this increasingly important technology.

LLMs, RAG, and Emerging Security Concerns

This document summarizes key findings from research on Large Language Model (LLM)-based AI chatbots, specifically those employing Retrieval-Augmented Generation (RAG), and the associated security and privacy challenges. LLMs form the foundation of modern chatbots, excelling at generating human-quality text, but they can sometimes produce inaccurate information or lack sufficient knowledge. RAG addresses these limitations by combining the LLM’s generative power with information retrieved from external knowledge sources, such as databases or documents. This allows chatbots to provide more accurate, grounded, and up-to-date responses.

RAG systems function by first retrieving relevant documents based on a user’s query, then combining these documents with the query before the LLM generates a response. Research highlights a growing number of vulnerabilities within RAG systems, including the potential for LLMs to reveal sensitive information from training data or external sources, even when augmented with RAG, through techniques like indirect prompt injection. Attackers can also manipulate the external knowledge source by injecting malicious information, poisoning the system and causing it to generate incorrect or harmful responses. Furthermore, RAG systems can inadvertently leak Personally Identifiable Information (PII) present in retrieved documents or the LLM’s training data.

Other threats include crafting prompts that manipulate the LLM’s behavior, determining whether specific data was used in training, and altering the LLM’s responses to reflect a specific bias. Several research efforts are focused on mitigating these risks, including adding noise to the retrieval process or LLM output to protect individual data points using differential privacy, modifying the decoding process to reduce the likelihood of revealing sensitive information, and developing retrieval mechanisms resistant to malicious data injection. Researchers are also exploring input validation and sanitization to prevent prompt injection attacks, regularly auditing knowledge sources for accuracy, training LLMs to be more robust against attacks, designing secure RAG architectures from the outset, and removing or obscuring sensitive information from retrieved documents before they are processed. Ongoing research is crucial to ensure these systems are deployed responsibly.

Retrieval-Augmented Generation Threat Model and Analysis

This work presents a detailed analysis of Retrieval-Augmented Generation (RAG) systems, focusing on understanding their unique privacy and security challenges. The research establishes a formal threat model by precisely describing the system’s architecture. A RAG system operates in two phases: retrieval and generation. During retrieval, the system embeds the user’s query into a vector representation and compares it to the embeddings of documents within a knowledge base using similarity metrics to identify the most relevant documents. The system then selects a subset of these documents, which are combined with the original query to create an augmented query.

This augmented query serves as input to the LLM generator, enriching the original request with retrieved contextual information. The LLM then synthesizes a final response, leveraging both its pre-existing knowledge and the newly provided context. The methodology extends to a thorough examination of the LLM generator itself. LLMs are trained on extensive text corpora to learn language patterns and relationships. The process begins with tokenization, converting raw text into sequences of tokens, and embedding, mapping each token to a dense vector representation.

The LLM then predicts subsequent tokens based on the input and previously generated tokens, optimizing its parameters to maximize the likelihood of observed data. Fine-tuning techniques, including reinforcement learning from human feedback, further refine the model’s performance and align its outputs with human preferences. To address privacy concerns, the research incorporates Differential Privacy (DP), a framework designed to limit the influence of individual data points on model outputs. This framework is applied to RAG systems, acknowledging the additional privacy risks introduced by the external document store.

RAG Systems Face Novel Privacy and Integrity Threats

This work presents a formal threat model for Retrieval-Augmented Generation (RAG) systems, identifying vulnerabilities arising from the combination of external knowledge retrieval and large language models. Researchers established a structured classification of potential attackers based on their access to system components and data, to analyze potential attack surfaces. The study formally defines key threat vectors, including document-level membership inference and data poisoning, which pose significant privacy and integrity risks in real-world deployments. The RAG system operates by embedding both the user query and documents from a knowledge base into vector representations, then using a retriever to identify the most relevant documents.

Experiments demonstrate that the system selects a top-k subset of documents based on similarity scores calculated using these embeddings. The selected documents are then combined with the original query to create an augmented query, which is fed into the language model generator to produce a final response. Researchers formally defined differential privacy, a framework for mitigating privacy risks, as a randomized mechanism that satisfies (ε, δ)-differential privacy. This means that for any pair of adjacent datasets, the probability of obtaining a specific output differs by at most a factor of *e ε * plus δ, ensuring limited influence of individual data points on model outputs.

The study extends this framework to RAG systems, acknowledging the additional privacy risks introduced by the external document store. The threat model identifies potential attacks at various stages of the RAG process, from querying the system to exploiting leaked corpora or embedding distributions. Adversary types are categorized based on their access level, ranging from uninformed outsiders attempting to extract information solely through queries, to aware insiders combining privileged access with external knowledge to maximize attack power. This detailed analysis lays the foundation for a more rigorous understanding of privacy and security challenges in RAG systems and informs the development of robust mitigation strategies.

RAG Systems, Privacy Risks and Attack Vectors

This research establishes a formal understanding of the privacy and security risks inherent in Retrieval-Augmented Generation (RAG) systems, a new approach to natural language processing. The team identified that while RAG improves accuracy and reduces inaccuracies in large language models, its reliance on external knowledge bases introduces unique vulnerabilities not present in traditional systems. They developed a structured classification of potential attackers based on their access to system components and data, and formally defined key threats such as determining whether specific documents were used in retrieval and the potential for malicious data to influence system behavior. The work clarifies how RAG systems might fail to protect confidential information and maintain data integrity, highlighting the need for defenses across all parts of the system, including both the retrieval component and the knowledge base itself.

Researchers demonstrated that techniques like differential privacy, adversarial filtering, and resistance to prompt manipulation show promise, but require further investigation to confirm their effectiveness. Future research directions include combining these approaches with adversarial training, weighting documents based on trustworthiness, and validating information from external sources. The authors acknowledge that current defenses require careful implementation and ongoing evaluation, and that the formalization presented serves as a foundation for designing more secure and privacy-preserving RAG systems as they become more widely used in critical applications. This work provides a crucial step towards building trustworthy AI systems that can reliably access and utilize external knowledge without compromising sensitive information.

👉 More information
🗞 RAG Security and Privacy: Formalizing the Threat Model and Attack Surface
🧠 ArXiv: https://arxiv.org/abs/2509.20324

Rohail T.

Rohail T.

As a quantum scientist exploring the frontiers of physics and technology. My work focuses on uncovering how quantum mechanics, computing, and emerging technologies are transforming our understanding of reality. I share research-driven insights that make complex ideas in quantum science clear, engaging, and relevant to the modern world.

Latest Posts by Rohail T.:

Vision Transformers Demonstrate Compositionality Using Wavelet Representations, Achieving 1-Level Decomposition

Vision Transformers Demonstrate Compositionality Using Wavelet Representations, Achieving 1-Level Decomposition

January 8, 2026
Synthetic Training Environments Advance Urban Warfare Skills with Video-Based Performance Analytics

Synthetic Training Environments Advance Urban Warfare Skills with Video-Based Performance Analytics

January 8, 2026
Spark Framework Enables Task-Specific Search Personalization with Coordinated Large Language Model Agents

Spark Framework Enables Task-Specific Search Personalization with Coordinated Large Language Model Agents

January 8, 2026