Privacybench Evaluates RAG Assistants, Revealing 26.56% Secret Exposure in Conversations

The increasing reliance on personalised artificial intelligence raises critical questions about user privacy and data security. Srija Mukhopadhyay, Sathwik Reddy, and Shruthi Muthukumar, from the International Institute of Information Technology Hyderabad, alongside Jisun An and Ponnurangam Kumaraguru, have addressed this challenge with the development of PrivacyBench. This novel benchmark utilises socially-grounded datasets containing embedded secrets, assessed through multi-turn conversations, to rigorously evaluate how well AI systems preserve sensitive information. Their research reveals a significant vulnerability in current Retrieval-Augmented Generation (RAG) assistants, demonstrating secret leakage in up to 26.56% of interactions, and highlights the limitations of relying solely on privacy-aware prompts for mitigation. Ultimately, this work underscores the necessity for fundamental architectural changes that prioritise privacy by design, ensuring responsible and ethical AI deployment.

User digital footprints contain private information, and current AI Assistants, accessing this data, are unable to adequately safeguard it. This poses a risk of sensitive information leakage, potentially causing harm to users. Research focuses on the vulnerabilities of Retrieval-Augmented Generation (RAG)-based assistants, which utilise a knowledge base alongside user data, investigating whether these systems can inadvertently reveal private details despite intended security measures.

PrivacyBench, Method

The research team pioneered PrivacyBench, a novel benchmark designed to rigorously audit contextual privacy in personalized assistants. This framework generates evaluation datasets containing embedded secrets within realistic social contexts, addressing a critical gap in existing safety evaluations which disproportionately focus on static, single-turn queries. To construct these datasets, scientists harnessed a method for creating socially grounded scenarios, embedding confidential information that represents genuine user secrets, and then employed a multi-turn conversational evaluation to assess privacy preservation over extended interactions. Experiments involved five state-of-the-art Retrieval-Augmented Generation (RAG) models, subjected to a series of dynamic conversations designed to elicit potential privacy breaches.

The core of the experimental setup relied on measuring secret leakage, the unintended disclosure of confidential information, revealing that unmodified RAG assistants leaked secrets in up to 26.56% of conversations. Implementing a privacy-aware prompt reduced the average leakage rate to 5.12%, demonstrating a substantial improvement, but the retrieval mechanism remained a critical point of failure. This innovative analysis revealed that current systems lack structural safeguards, necessitating a shift towards privacy-by-design principles. The work establishes a foundational baseline for future privacy safeguards by providing the first quantitative analysis of this specific failure mode, meticulously measuring both leakage and over-secrecy to evaluate adherence to Contextual Integrity and ensure information flows align with nuanced social norms.

RAG Assistants Leak Secrets in PrivacyBench Tests

Scientists have developed PrivacyBench, a new benchmark designed to rigorously measure secret preservation in personalized systems. This work introduces socially grounded datasets containing embedded secrets, coupled with a multi-turn conversational evaluation process, to assess how effectively systems protect sensitive user information. Experiments revealed that Retrieval-Augmented Generation (RAG) assistants leak secrets in up to 26.56% of interactions, highlighting a significant vulnerability in current architectures. The team measured the performance of several leading models , GPT-5-Nano, Gemini-2.5-Flash, Kimi-K2, Llama-4-Maverick, and Qwen3-30B , using Gemini-2.5-Flash as an adversarial extractor and a judging panel comprised of GLM-4-32B, Phi-4, and Mistral-Nemo.

Baseline tests demonstrated an average leakage rate of 15.80%, meaning secrets were disclosed in approximately one in six conversations, with Gemini-2.5-Flash exhibiting the highest vulnerability at 26.56% and GPT-5-Nano proving most robust at 6.32%. Researchers found that the retrieval mechanism consistently accessed sensitive data, surfacing documents containing secrets 62.80% of the time on average, placing the entire burden of privacy on the generator. Implementing a privacy-aware prompt lowered the average leakage rate to 5.12%, with Llama-4-Maverick experiencing a substantial reduction to 0.46%, but the Kimi-K2 model increased leakage, indicating the unreliability of prompt-based safeguards as a sole solution. Further analysis showed that the privacy-aware prompt improved system utility, decreasing the average over-secrecy rate from 35.74% to 27.80%, suggesting enhanced contextual understanding of access rules. Critically, comparative probing strategies revealed a consistent leakage rate of 15.28% for both direct and indirect probes, demonstrating the vulnerability is inherent to the model’s response to relevant contexts. These findings underscore the urgent need for privacy-by-design safeguards and structurally sound architectures to ensure ethical and inclusive web experiences.

RAG Assistants Leak User Secrets Regularly

This research demonstrates a fundamental privacy vulnerability within Retrieval-Augmented Generation (RAG) based personal assistants. Evaluations using a novel benchmark with socially grounded datasets reveal that these systems leak user secrets in over 16% of conversations without specific safeguards. The study establishes that current architectures place an unsustainable burden on the generator component to preserve privacy, as the retrieval mechanism indiscriminately surfaces sensitive information. While employing a privacy-aware prompt can reduce leakage to approximately 5%, this improvement is considered a fragile solution, as the underlying issue of unrestricted data retrieval persists.

The research highlights that the inappropriate retrieval rate remains high, consistently exposing the generator to secrets and creating a single point of failure. Authors acknowledge the current metrics focus on verbatim leakage and do not capture more subtle forms of privacy breaches. Future work should concentrate on developing structural safeguards, like access control modules, to filter sensitive data before it reaches the generation stage. This shift in focus, from prompting as a remedy to privacy-by-design principles, is crucial for building trustworthy personalised AI systems, suggesting that lasting privacy solutions require architectural changes, not simply adjustments to the generative process.

👉 More information
🗞 PrivacyBench: A Conversational Benchmark for Evaluating Privacy in Personalized AI
🧠 ArXiv: https://arxiv.org/abs/2512.24848

Rohail T.

Rohail T.

As a quantum scientist exploring the frontiers of physics and technology. My work focuses on uncovering how quantum mechanics, computing, and emerging technologies are transforming our understanding of reality. I share research-driven insights that make complex ideas in quantum science clear, engaging, and relevant to the modern world.

Latest Posts by Rohail T.:

Plasma Applications Enabled by Model Correcting 40% Heating Error in Electron Temperature

Quantum Technology Enables Precise Current Measurement with a Saturable, Lower Bound

January 9, 2026
Enhanced Quasiparticle Density Advances Tunable Emission in PVA-Doped Monolayer WS with 41% Improvement

Relativistic Fluid Dynamics Enables Precise Momentum Spectrum Analysis with Zero Order Terms and Ab Initio Calculation

January 9, 2026
Efficient LLM Inference Achieves Speedup with 4-bit Quantization and FPGA Co-Design

Space Data Centers Achieve Communication Efficiency with OptiVote and Federated Learning

January 9, 2026