Trap Benchmark Reveals Vulnerabilities in Web Agents Powered by Large Language Models

The growing prevalence of web-based agents, designed to automate tasks like email handling and professional networking, presents new security challenges, as these agents prove surprisingly vulnerable to subtle manipulation. Karolina Korgul, Yushi Yang, and Arkadiusz Drohomirecki, along with Piotr Błaszczyk, Will Howard, and Lukas Aichberger, investigate how cleverly disguised instructions within website interfaces can redirect these agents from their intended goals. Their work introduces the Task-Redirecting Agent Persuasion Benchmark, or TRAP, a rigorous evaluation that demonstrates a significant susceptibility to such “injection attacks” across six leading language models, with success rates averaging 25 percent. This research reveals fundamental, psychologically-rooted weaknesses in current web agent designs, highlighting the urgent need for more robust security measures as these technologies become increasingly integrated into everyday life.

The research investigates vulnerabilities in autonomous web agents to prompt injection attacks, where adversarial instructions embedded within interface elements cause agents to deviate from their intended tasks. Evaluations across six advanced models reveal that agents are susceptible to prompt injection in 25% of tasks on average, with susceptibility ranging from 13% for one model to 43% for another.

Realistic Prompt Injection Attacks on LLMs

This study details a security assessment of Large Language Models (LLMs) and their susceptibility to prompt injection attacks, focusing on realistic scenarios mimicking user manipulation. The researchers developed the TRAP framework, Targeted, Realistic, Adversarial Prompts, to systematically test LLM security, creating attacks that are targeted, realistic, and adversarial. TRAP explores variations in interface design, persuasion techniques based on established psychological principles, manipulation methods, prompt location, and tailoring to the specific task. The research demonstrates that even minor alterations to interface elements or contextual information can frequently double the success rate of these attacks, highlighting systemic and psychologically-rooted weaknesses in web-based agents. The team provides a modular social-engineering injection framework to facilitate further investigation in this area. The LinkedIn clone, NetworkIn, was used to study location-based vulnerabilities, revealing that placing malicious prompts within a user’s profile was particularly effective.

Agent Vulnerability To Manipulative Instructions Revealed

Scientists have demonstrated that web-based agents, powered by large language models, are vulnerable to “task-redirecting agent persuasion” attacks. Experiments across six leading models showed an average susceptibility to injection attacks of 25% across numerous runs. The team measured both task completion and attack success, defining success as the agent clicking an injected button or hyperlink and being redirected to a malicious website.

Results show that 948 successful attacks were recorded, with many runs ending prematurely as agents entered loops after encountering injected text. Analysis revealed significant transferability of successful attacks between models, with injections crafted for one model often working on others. The most effective persuasion principles were Social Proof and Consistency, while adversarial suffixes, chain-of-thought injection, and many-shot conditioning were common manipulation methods. Experiments across six current models demonstrate that these agents are susceptible to prompt injection, successfully diverted from their intended tasks in approximately 25% of cases. The study reveals underlying weaknesses in how agents process information and respond to persuasive cues within dynamic online environments. A key achievement of this work is the development of a modular framework for constructing attacks, which breaks down prompt injections into distinct social-engineering components and principles of persuasion.

This approach allows researchers to analyse precisely how manipulation strategies interact with agent reasoning, context length, and instruction following. The researchers constructed reproducible web clones and combined objective task outcomes with behavioural evaluation, establishing a flexible foundation for ongoing study of both current vulnerabilities and potential defensive strategies. The study acknowledges limitations including the focus on a limited number of cloned websites and basic lexical edits for tailoring attacks. Future research will expand the range of attack surfaces, environments, and models tested, as well as develop systematic methods for mitigating these vulnerabilities within a reproducible framework, crucial for building more secure and reliable autonomous agents.

👉 More information
🗞 It’s a TRAP! Task-Redirecting Agent Persuasion Benchmark for Web Agents
🧠 ArXiv: https://arxiv.org/abs/2512.23128

Rohail T.

Rohail T.

As a quantum scientist exploring the frontiers of physics and technology. My work focuses on uncovering how quantum mechanics, computing, and emerging technologies are transforming our understanding of reality. I share research-driven insights that make complex ideas in quantum science clear, engaging, and relevant to the modern world.

Latest Posts by Rohail T.:

Scalable Photonic Neural Network Training Achieves Reduced Memory Usage on Large Datasets

Scalable Photonic Neural Network Training Achieves Reduced Memory Usage on Large Datasets

January 7, 2026
Quantum Noise Spectroscopy with PL5 Centers Enables Room-Temperature Imaging of Silicon Carbide Defects

Quantum Noise Spectroscopy with PL5 Centers Enables Room-Temperature Imaging of Silicon Carbide Defects

January 7, 2026
Mo-heom Achieves Exact Molecular Excitation Dynamics, Capturing 3D Rotational Invariance

Mo-heom Achieves Exact Molecular Excitation Dynamics, Capturing 3D Rotational Invariance

January 7, 2026