Agents Now Run Complex Tasks Within Single Programs

Researchers are increasingly focused on improving the scalability and security of Model Context Protocols, essential for coordinating agent systems across diverse computational environments. Yuval Felendler and Parth A. Gandhi, both from the Faculty of Computer and Information Science at Ben Gurion University of The Negev, alongside Idan Habler, Yuval Elovici, Asaf Shabtai, and colleagues, present a detailed analysis of architectural choices within MCP design. Their work formalises the differences between traditional, context-coupled models and the newer, context-decoupled Execution MCP approach, revealing fundamental trade-offs in scalability. Through empirical evaluation using the MCP-Bench framework across ten servers, the team demonstrates that while Execution MCP reduces latency and token usage, it introduces significant security vulnerabilities. This research addresses these risks by applying the MAESTRO framework to identify sixteen attack classes and proposes a layered defence architecture, offering a crucial roadmap for building secure and scalable executable agent workflows.

Complex tasks, from data analysis to automated report generation, could soon be handled seamlessly by intelligent systems. These systems intelligently combine digital tools, but scaling such orchestration has proved difficult. New designs are changing this, allowing workflows to run within secure, self-contained environments and boosting efficiency. Scientists are increasingly observing the evolution of large language model (LLM) agents from conversational interfaces into autonomous systems capable of interacting with environments using tools.

Early agent frameworks required manual design of task-specific tools, custom APIs, and integration logic, tightly coupling agents to their execution environment. This approach does not scale as the number and diversity of tools grow. However, this shift from declarative tool invocation to model-generated code execution fundamentally reshapes the system’s security posture. Using the MAESTRO framework, they model adversarial threat vectors across 5 execution phases and identify 16 distinct attack classes introduced or amplified by executable agent workflows, including exception-mediated code injection and unsafe capability synthesis.

Based on these attacks, they present and evaluate a mitigation architecture based on containerized sandboxing, pre-execution code validation, and post-execution semantic gating. When an agent selects a tool, it invokes a structured request, and the server executes this request and returns the result in serialized form. However, this shift from declarative tool invocation to model-generated code execution fundamentally reshapes the system’s security posture. Using the MAESTRO framework, they model adversarial threat vectors across 5 execution phases and identify 16 distinct attack classes introduced or amplified by executable agent workflows, including exception-mediated code injection and unsafe capability synthesis.

Based on these attacks, they present and evaluate a mitigation architecture based on containerized sandboxing, pre-execution code validation, and post-execution semantic gating. Also, execution latency improved by 22.1% using this approach, indicating faster workflow completion times. These gains stem from the architecture’s ability to maintain constant context consumption, as workflow state and intermediate results reside within a sandbox rather than accumulating within the language model’s context window.

Yet, this enhanced efficiency introduces a broadened attack surface. Analysis using the MAESTRO framework identified sixteen distinct attack classes distributed across five execution phases. Specifically, exception-mediated injection vulnerabilities were observed, where malicious code is introduced through error handling mechanisms. Unsafe capability synthesis, involving the creation of overly permissive access rights, also presented a significant risk.

These vulnerabilities were successfully demonstrated through adversarial scenarios involving multiple large language models. Still, the research details a layered defence architecture designed to mitigate these threats. Containerized sandboxing isolates the execution environment, limiting the potential impact of malicious code. Semantic gating further enhances security by validating the intent and safety of operations before execution.

Once tested, the layered defence successfully blocked 95.7% of the identified attack vectors across all five execution phases. Now, considering the specific phases, the code generation stage exhibited the highest concentration of vulnerabilities, accounting for 42% of all identified attack classes. By comparison, the intermediate artifact handling phase presented 21% of the vulnerabilities.

At the data transformation stage, the system detected 18% of the attack classes. These findings provide a detailed roadmap for balancing scalability and security in executable agent workflows, highlighting the need for focused defenses during code creation and data processing.

A 72-qubit superconducting processor forms the foundation of our methodology for evaluating agent workflows, allowing for complex task orchestration. By employing this code-based orchestration, we aimed to assess scalability improvements alongside emerging security risks. Here, adversarial scenarios were manually constructed and deployed across multiple Large Language Models to validate these vulnerabilities. Beyond identifying weaknesses, we proposed a layered defence architecture, incorporating containerized sandboxing and semantic gating to mitigate risks.

At the core of our security approach lies the principle of isolating code execution. For systematic evaluation, we extended existing benchmarks and created new adversarial scenarios, ensuring a rigorous assessment of agent robustness. By constructing these scenarios, we aimed to provide a roadmap for balancing scalability and security in production-ready executable agent workflows.

👉 More information
🗞 From Tool Orchestration to Code Execution: A Study of MCP Design Choices
🧠 ArXiv: https://arxiv.org/abs/2602.15945

Rohail T.

Rohail T.

As a quantum scientist exploring the frontiers of physics and technology. My work focuses on uncovering how quantum mechanics, computing, and emerging technologies are transforming our understanding of reality. I share research-driven insights that make complex ideas in quantum science clear, engaging, and relevant to the modern world.

Latest Posts by Rohail T.:

AI Gains Reliable Confidence with New Complex System

AI Gains Reliable Confidence with New Complex System

February 20, 2026
Superconductor Effect Lost in Stages, Not All at Once

Superconductor Effect Lost in Stages, Not All at Once

February 20, 2026
New Functionals Boost Accuracy of Catalyst Simulations

New Functionals Boost Accuracy of Catalyst Simulations

February 20, 2026