Researchers are increasingly concerned with the security vulnerabilities of large language model-based agents as they transition from simple task completion to complex, personalised AI assistants like OpenClaw. Yuhang Wang, Feiming Xu, and Zheng Lin from Xidian University, alongside Guangyu He, Yuzhe Huang, and Haichang Gao, address the critical lack of realistic security evaluation for these agents with their new framework, the Personalized Agent Security Bench (PASB). This work is significant because it moves beyond synthetic testing to incorporate personalised usage, realistic toolchains, and long-term interactions, allowing for comprehensive, black-box security assessments on live systems. Their systematic evaluation of OpenClaw using PASB reveals critical vulnerabilities in prompt processing, tool usage, and memory retrieval, demonstrating substantial security risks inherent in the deployment of personalised AI agents.

Scientists are increasingly developing agents capable of undertaking complex real-world tasks, but their practical deployment also introduces severe security risks. Large language model (LLM)-based agents have demonstrated capabilities in autonomous reasoning, task planning, and interaction with external tools and environments.

Agents are increasingly being explored in safety-critical domains such as financial services, healthcare, and autonomous driving to improve automation and decision-making efficiency. These agents are evolving into personalized AI assistants that integrate long-term interaction histories, private user context, and high-privilege toolchains.

Representative systems, such as OpenClaw, indicate a transition from “demo-ready task agents” to “always-on personal assistants,” expanding the scope of security failures. This shift toward personalization fundamentally changes the security landscape of agentic systems. Existing research has largely emphasized effectiveness, generalisation, and task completion performance, whereas security evaluations for real-world deployments remain limited.

Personalized agents exhibit three key properties: persistent operation and long-horizon interactions, accumulation of private context, and high-privilege tools and actionable capabilities. The framework builds upon existing agent attack paradigms, incorporating personalized usage scenarios, realistic toolchains, and long-horizon interactions to enable black-box security evaluation on live systems.

This approach allows for a comprehensive assessment of vulnerabilities in deployed agents, moving beyond synthetic or task-centric evaluations. Using OpenClaw as a case study, the study systematically evaluated security across multiple personalized scenarios, tool capabilities, and attack types. Results demonstrate that OpenClaw exhibits critical vulnerabilities at various execution stages, including user prompt processing, tool usage, and memory retrieval.

These vulnerabilities highlight substantial security risks inherent in the deployment of personalized agent systems. The work identifies that risks in personalized settings extend beyond undesired text generation to encompass unsafe actions and potential exfiltration of private assets through end-to-end interactions.

Evaluation design incorporates modelling of personalized scenarios and private assets, constructing representative usage scenarios spanning personal communication, information management, and long-horizon task coordination. Auditable private assets, such as honey tokens and confidential files, were provided under controlled settings to enable measurable leakage criteria.

PASB enhances existing agent attack paradigms by focusing on realistic toolchains and long-horizon interactions. This allows for characterisation of how attacks persist and propagate across stages like prompt processing, external content access, tool invocation, and memory-related behaviours. This framework moves beyond traditional, limited testing methods by incorporating realistic user scenarios, toolchains, and extended interactions to simulate real-world deployments.

Using OpenClaw as a case study, the evaluation revealed critical security weaknesses across multiple stages of operation, including prompt processing, tool usage, and memory access. The findings demonstrate that OpenClaw is susceptible to attacks that can propagate and accumulate over time, creating significant risks for users.

Reliance on simple prompt protections or security assessments based on artificial benchmarks may not adequately address the complex threats faced by personalized agent systems. The PASB framework and associated evaluation environment offer a reproducible baseline for future research into the security of these systems, enabling more thorough and realistic testing.

Acknowledging the limitations of current evaluation methods, the researchers emphasize the need for end-to-end security assessments that account for the persistent nature, private context, and high-privilege tool access characteristic of personalized agents. Future work should focus on expanding the range of scenarios, tools, and attack types considered within the PASB framework to provide an even more comprehensive evaluation of agent security.

👉 More information
🗞 From Assistant to Double Agent: Formalizing and Benchmarking Attacks on OpenClaw for Personalized Local AI Agent
🧠 ArXiv: https://arxiv.org/abs/2602.08412

Tags:

large language model