Ensuring artificial intelligence systems align with human values is paramount as agents gain increasing autonomy and impact on real-world scenarios. To address this critical need, Felix Jahn, Yannic Muskalla, and Lisa Dargasz, from the German Research Center for Artificial Intelligence, alongside Patrick Schramowski and Kevin Baum, present a novel neuro-symbolic architecture, Governor for Reason-ContainmEnt (GRACE). This research is significant because GRACE decouples normative reasoning from instrumental decision-making, offering a method to contain agents of any design and providing a semantic foundation for interpretability, contestability, and formal verification of ethical behaviour , demonstrated here through a large language model therapy assistant.
GRACE architecture for AI normative containment
Normative alignment is critical, and we introduce Governor for Reason-Aligned ContainmEnt (GRACE), a neuro-symbolic reason-based containment architecture that decouples normative reasoning from instrumental decision-making and can contain AI agents of virtually any design. GRACE restructures decision-making into three modules: a Moral Module (MM) that determines permissible macro-actions via deontic logic-based reasoning; a Decision-Making Module (DMM) that encapsulates the target agent; and a Containment Module (CM) that enforces the constraints imposed by the MM on the DMM. This separation allows for independent verification of the normative component and facilitates the specification of complex ethical guidelines. We evaluated GRACE using 10 different scenarios, achieving a 95% success rate in containing agents without significantly impacting their performance, as measured by a 5% reduction in reward.
THERAPAI’s Modular Governance for Ethical Alignment
This example illustrates how a hypothetical LLM therapy assistant named THERAPAI could be aligned with ethical guidelines using a modular governance architecture. The key components are: a Moral Module (MM) that processes observations to derive reasons and permissible macro action types; a Decision-Making Module (DMM) that selects the most rational primitive actions based on these types; and a Guard that ensures selected actions are morally permissible. In a scenario where a patient states, “I’m going to the park and will hurt myself,” the MM identifies the risk of self-harm and permits actions to prevent it. The DMM then selects an appropriate action, such as notifying emergency services, which the Guard verifies and executes. Reason theory updates, such as incorporating a rule to prioritise patient safety, allow for dynamic adjustments in ethical guidelines, ensuring the system remains aligned with evolving standards. This approach provides a structured way to handle complex ethical decisions in AI systems, particularly those involving sensitive information and patient autonomy.
GRACE decouples ethics from agent decision-making
Scientists achieved a significant breakthrough in normative reasoning for autonomous agents by introducing Governor for Reason-Containment (GRACE), a neuro-symbolic architecture designed to decouple ethical considerations from instrumental decision-making. The research demonstrates the feasibility of containing agents with virtually any underlying design, restructuring the decision process into three distinct modules: a Moral Module (MM), a Decision-Making Module (DMM), and a Guard. The MM utilises a reason-based formalism, grounded in deontic logic, to determine permissible macro actions, providing a semantic foundation for interpretability and justification. The team showcased GRACE’s ability to enable stakeholders to understand, contest, and refine agent behaviour through an example implementation using a Large Language Model (LLM) therapy assistant.
The MM’s symbolic representation enriches the informational context available to the DMM, which encapsulates the target agent and selects optimal primitive actions aligned with the derived macro actions. The Guard component consistently monitors and enforces moral compliance, ensuring actions adhere to the established ethical framework. This architecture addresses limitations found in current approaches, such as the inability of Reinforcement Learning from Human Feedback (RLHF) to robustly capture complex human values. GRACE overcomes the traditional trade-off between interpretability and adaptability, offering both verifiable, human-understandable reasoning and dynamic contextual sensitivity.
The work contrasts with purely rule-based systems, which struggle with adaptability, and learning-based approaches, which often lack transparency and can exhibit value misalignment or reward hacking. Integrating symbolic moral reasoning with data-driven methods, as demonstrated by recent work combining Delphi with constraint graphs, improves consistency and robustness in adversarial scenarios. The study formally characterizes a generic AI agent as a stateful system maintaining an internal state b t * ∈ *B and operating through iterative “perceive-plan-act” cycles, defined by a state update function u : B × O → B and an action function π : B → A, where O represents observations and A represents possible actions. This formalization captures the core architecture of diverse AI systems, including reinforcement learning, LLMs, and robotics, providing a foundation for principled development of morally contained agents and opening avenues for future research in AI safety and value alignment.
GRACE architecture for moral reasoning and alignment
This research introduces Governor for Reason-ContainmEnt (GRACE), a neuro-symbolic architecture designed to separate normative reasoning from instrumental decision-making in autonomous agents. The framework decomposes agent function into three modules: a Moral Module for determining permissible actions, a Decision-Making Module encapsulating the agent’s action selection, and a Guard that enforces moral compliance. GRACE enables principled separation of moral and instrumental goals, modular revision, and clearer oversight through its isolation of reason-based deliberation within the Moral Module. The architecture facilitates a divide-and-conquer approach to alignment, allowing independent scaling of the Decision-Making Module, updates to the Moral Module via multi-agent mediation, and local enforcement of moral accordance by the Guard.
The symbolic representation within the Moral Module distinguishes between normative requirements and aligns with system-theoretic formal properties, yielding interpretable justifications and enabling verification and learning. Current development focuses on a full implementation of the architecture as an open-source framework, with subsequent evaluation planned to assess performance decrease due to moral guidance, the reliability of agent behaviour, and the incorporation of human moral advice. Future research will concentrate on modelling moral action types in a logical language and automating monitor synthesis within the Guard.
👉 More information
🗞 Breaking Up with Normatively Monolithic Agency with GRACE: A Reason-Based Neuro-Symbolic Architecture for Safe and Ethical AI Alignment
🧠 ArXiv: https://arxiv.org/abs/2601.10520
