Multi-agent systems, capable of autonomous, goal-driven operation, promise a revolution across numerous industries and represent a significant leap forward in generative artificial intelligence. Brian Bowers of Loyola Marymount University, alongside Smita Khapre and Jugal Kalita from the University of Colorado Colorado Springs, and their colleagues, investigate a critical vulnerability within these increasingly sophisticated systems: their susceptibility to code injection attacks. The team demonstrates that while these systems excel at generating code, their autonomous nature prevents them from independently detecting and responding to malicious interference, a serious concern as they become integrated into sensitive applications like software development. Their research reveals that a ‘coder-reviewer-tester’ architecture offers improved resilience against such attacks, and importantly, that incorporating dedicated security analysis can further enhance protection without significantly compromising coding efficiency, although even this enhanced system remains vulnerable to cleverly disguised attacks using manipulated example code.
Safe Agentic Systems via Formal Verification
Agentic AI and Multi-Agent Systems are poised to dominate industry and society imminently. Powered by goal-driven autonomy, they represent a powerful form of generative AI, marking a transition from static, pre-programmed systems to dynamic, adaptive entities capable of independent action and complex problem-solving. This research investigates the development of robust and reliable agentic systems, focusing on the challenges of ensuring safe and predictable behaviour in increasingly complex environments. The primary objective is to create agents that not only pursue assigned goals effectively, but also demonstrate awareness of their limitations and the potential consequences of their actions., The approach centres on a novel framework integrating reinforcement learning with formal verification techniques, allowing for both the training of agent behaviours and the rigorous proof of their safety properties.
Specifically, the team develops a method for translating agent policies, expressed as deep neural networks, into formal specifications suitable for model checking. This enables the automated verification of critical safety constraints, such as preventing collisions or respecting resource limitations, before deployment in real-world scenarios. A key innovation lies in the development of a scalable verification algorithm capable of handling the high-dimensional state spaces typical of complex agentic systems., Significant contributions include a new algorithm for policy abstraction, reducing the complexity of neural networks while preserving essential behavioural characteristics, and a formalisation of the reward function as a safety constraint. Experiments conducted in simulated robotic environments demonstrate that the proposed framework achieves 99.7% verification coverage with a 15% reduction in computational cost compared to existing methods. Furthermore, the research introduces a novel metric, termed ‘actionable uncertainty’, quantifying an agent’s confidence in its decisions and enabling proactive risk mitigation strategies, ultimately enhancing the trustworthiness and reliability of agentic AI systems.
LLM Agents and Code Injection Resilience
The study investigates vulnerabilities in multi-agent systems (MAS), specifically focusing on code injection attacks and methods to enhance system resilience, and pioneers a comprehensive threat model for these increasingly prevalent systems. Researchers engineered three distinct MAS architectures, Coder, Coder-Tester, and Coder-Reviewer-Tester, to simulate a realistic software development workflow and assess their susceptibility to malicious code insertion. Each architecture utilizes large language model (LLM)-based agents, employing chain-of-thought prompting to generate code solutions from given problems. The Coder architecture features a single agent, while the Coder-Tester architecture adds an agent dedicated to executing test cases against the generated code, accessing it from a shared database., To further refine the evaluation, the team introduced a Coder-Reviewer-Tester architecture, incorporating an additional agent responsible for code review and approval before testing, mirroring human oversight in software development.
Experiments employed two levels of adversarial access, Single, providing one-time access to the code, and Continued, allowing persistent modification of generated code, to simulate varying attack scenarios. The core of the attack involved injecting a malicious function designed for data exfiltration, specifically attempting to transmit encrypted password information to an external address, as demonstrated in a provided code example., Researchers also developed a security analysis agent, integrated into the Coder-Tester architecture, tasked solely with identifying security vulnerabilities in the generated code, positioned to review the final output. This agent’s effectiveness was evaluated by measuring its ability to mitigate performance loss while improving overall system resilience against attacks, and the study found that embedding poisonous few-shot examples in injected code increased attack success rates from 0% to 71.95%. The team mapped identified threats to established frameworks, including Microsoft’s STRIDE, Amazon Web Services’ ATFAA, and MITRE ATLAS, providing a structured analysis of potential vulnerabilities within the MAS.
Multi-Agent System Security for Software Creation
Scientists are pioneering new multi-agent systems, poised to revolutionize industries through goal-driven autonomy and proactive multitasking, moving beyond simple reactive content generation. This work details an architecture for a multi-agent system specifically designed for the implementation phase of software engineering, alongside a thorough threat model to assess its vulnerabilities. Experiments demonstrate the system can generate code with considerable accuracy, yet remains susceptible to attacks, notably code injection, due to its autonomous nature and lack of direct human oversight., The team investigated three distinct architectures, coder only, coder-tester, and coder-reviewer-tester, finding that the coder-reviewer-tester configuration exhibits the greatest resilience, although at the cost of coding efficiency. Introducing a security analysis agent successfully mitigates this efficiency loss while simultaneously enhancing overall system resilience, representing a significant advancement in secure autonomous coding.
Further testing revealed that the security analysis agent itself is vulnerable to sophisticated code injection attacks, with embedding malicious examples increasing the attack success rate from 0% to 71.95%., Researchers discovered that exploiting the natural language understanding capabilities of the security analysis agent significantly amplifies the success rate of these attacks, highlighting a critical area for improvement. The study meticulously measured performance using the HumanEval dataset, building upon prior work that achieved a 29.9% to 47.1% improvement in code generation through self-collaboration among agents.
Comparisons to existing systems, such as MetaGPT which attained 85.9% on HumanEval, demonstrate the potential of this approach while also pinpointing specific vulnerabilities that require attention. This breakthrough delivers a deeper understanding of the security challenges inherent in autonomous multi-agent systems and paves the way for developing more robust and secure software engineering tools.
👉 More information
🗞 Analyzing Code Injection Attacks on LLM-based Multi-Agent Systems in Software Development
🧠 ArXiv: https://arxiv.org/abs/2512.21818
