The increasing sophistication of artificial intelligence systems, capable of complex reasoning and interaction through tools like LangChain and GraphChain, introduces new security vulnerabilities, particularly from multimodal prompt injection attacks. Toqeer Ali Syed, Mishal Ateeq Almutairi, and Mahmoud Abdel Moaty, from the Islamic University of Madinah and the Arab Open University-Bahrain, address this critical challenge with a novel defence framework. Their research presents a system that meticulously sanitises all prompts, whether originating from users or internal processes, and independently validates outputs before they propagate through the network. This innovative approach, which leverages provenance tracking across different data types, significantly improves the detection of malicious instructions and establishes more stable, trustworthy agentic AI systems, paving the way for secure and reliable autonomous operation.
Multimodal LLMs And Prompt Injection Attacks
This research addresses the growing security threat of prompt injection attacks, particularly within agentic AI systems, where large language models (LLMs) act as intelligent agents making decisions and taking actions, and the increasing complexity introduced by multimodal LLMs that process both text and images. Prompt injection involves malicious input hijacking the LLM, causing it to disregard intended instructions and perform unintended actions, a risk amplified when LLMs control real-world applications. The authors propose a multi-layered defense framework, the Cross-Agent Multimodal Provenance-Aware Framework, to mitigate these attacks. This system utilizes a multi-agent architecture, employing interacting modules for comprehensive protection.
Input undergoes layered sanitization both before and after processing by agents, and potentially malicious content is masked to limit its influence on the LLM’s output. A key feature is meticulous provenance tracking, which records the origin and processing history of data as it flows through the system, enabling identification and isolation of compromised information. Finally, the LLM’s output is validated to ensure alignment with expected behavior. Comparative evaluation against baseline defenses, including keyword filtering, safety fine-tuning, post-hoc output filtering, and single vision-language models, demonstrates the framework’s superior performance.
Results show a 94% detection accuracy for prompt injection attacks, a 70% reduction in trust leakage, and 96% task accuracy retention, comparable to the best baseline. This work highlights the importance of defense in depth, trust boundaries, provenance tracking, and validation at each stage of the process for securing agentic, multimodal LLM systems. This research is significant because it addresses a critical security gap hindering the widespread adoption of LLMs, specifically focusing on the unique challenges of securing LLMs used as intelligent agents and accounting for the complexity of multimodal inputs. The proposed architecture is designed for incremental integration into existing applications, offering a practical framework for building secure and trustworthy agentic AI systems.
Cross-Agent Defense Against Multimodal Prompt Injection
Scientists have engineered a Cross-Agent Multimodal Provenance-Aware Defense Framework to address the increasing threat of multimodal prompt injection attacks in complex AI systems. This comprehensive system sanitizes inputs and validates outputs across interconnected agents, extending beyond single-model filtering to encompass entire agentic pipelines like LangChain and GraphChain. The methodology centers on establishing a provenance ledger, a detailed record tracking the modality, source, and trust level of every prompt and generated output. The framework employs specialized sanitizers, Text Sanitizer and Visual Sanitizer, working in concert to identify and neutralize malicious content in both text and image data before it reaches the core language models.
The Text Sanitizer removes harmful instructions, while the Visual Sanitizer analyzes images for embedded adversarial content and manipulated metadata. This ensures communication between agents adheres to predefined trust frameworks, preventing the propagation of injected instructions, and the provenance ledger meticulously records all modifications for detailed auditing. An Output Validator independently verifies all outputs generated by language and vision-language models, minimizing cross-agent contamination and ensuring safe tool behavior. A masking plan generates a trust strategy, restricting access to potentially harmful content during LLM inference. Experiments demonstrate significant enhancement in multimodal injection detection accuracy and minimization of cross-trust leakage, resulting in stable and predictable execution pathways for complex AI workflows. This innovative methodology establishes a new standard for secure, understandable, and reliable multi-agent systems.
Cross-Modal Defense Against Prompt Injection Attacks
Scientists have developed a new defense framework to secure complex autonomous systems powered by large language models and vision-language models, addressing the growing threat of multimodal prompt injection attacks. This work introduces a Cross-Multimodal Provenance-Aware Defense Framework that meticulously sanitizes all incoming prompts and independently verifies outputs before they are processed by any agent within the system, creating a zero-trust communication fabric. The framework employs four cooperating agents: a Text Sanitizer, a Visual Sanitizer, a Main Task Model, and an Output Validator, coordinated by a shared provenance ledger that tracks data origin and trust levels. The Text Sanitizer analyzes text inputs at the token level, assigning a trust score based on intent and recording the source in the provenance map.
The Visual Sanitizer examines images for anomalies, scans metadata, and utilizes CLIP technology with optical character recognition to assess visual content, calculating a visual trust score and redacting low-trust regions. These agents combine text and visual provenance information into a single ledger, recording modality, trust scores, and influence relationships, which is then used to create a trust-aware attention mask before LLM inference. Experiments demonstrate significant enhancements in multimodal injection detection accuracy and minimization of cross-agent trust leakage, resulting in stable execution pathways. The system comprehensively sanitizes all inputs, text, images, tool responses, and inter-agent messages, and the output validation layer scans for policy violations and secret leakage, requesting regeneration if low-trust content significantly influences the output, and approving release only when content is deemed safe. Modifications to LangChain and GraphChain components enable end-to-end sanitization without altering the underlying LLM back-end, delivering provable safety, cross-agent isolation, trust-aware reasoning, and auditable decisions.
Multimodal System Security Via Data Provenance
This research presents a new defense framework designed to secure complex autonomous systems built with large language models and multi-modal inputs. The team developed a system that thoroughly sanitizes all incoming text and image data, and independently validates all outputs before they are used by subsequent components within a workflow. This multilayered approach, incorporating a provenance ledger to track data origin and trust levels, effectively addresses the growing threat of multimodal prompt injection attacks, where malicious instructions are hidden within various data types. The findings demonstrate a significant improvement in detecting injected instructions and preventing their spread through interconnected systems, such as those built with LangChain or GraphChain.
By focusing on verifying each interaction between agents and language models, the researchers establish more reliable and secure pipelines for agentic AI. While acknowledging the inadequacy of existing filters against sophisticated attacks, the authors note the framework’s ability to maintain performance on legitimate tasks while enhancing security. Future work could explore broader applications of this provenance-aware approach to further strengthen the trustworthiness of increasingly complex AI systems.
👉 More information
🗞 Toward Trustworthy Agentic AI: A Multimodal Framework for Preventing Prompt Injection Attacks
🧠 ArXiv: https://arxiv.org/abs/2512.23557
