Researchers are tackling the challenge of ‘black box’ decision-making within deep reinforcement learning (DRL) as mobile networks evolve towards 6G. Abhishek Duttagupta, MohammadErfan Jabbari, and Claudio Fiandrino, all from IMDEA Networks Institute, alongside Marco Fiore and Joerg Widmer, present SymbXRL, a novel technique for explainable reinforcement learning (XRL) that generates human-interpretable explanations for DRL policies. This work is significant because it moves beyond simply observing DRL behaviour, instead synthesising symbolic representations of key concepts and rules to expose the reasoning behind network decisions. Validating SymbXRL in practical network management scenarios, the team demonstrates not only improved explanation clarity but also a 12% increase in median cumulative reward compared to standard DRL approaches, opening the door to intent-based control of future networks.
SymbXRL unlocks transparent Deep reinforcement learning decisions
The research team tackled the ‘black box’ nature of DRL, which, while effective in resource allocation problems like user scheduling and antenna control, lacks the transparency needed for real-world deployment. This innovative method couples symbolic representation with Logical reasoning, enabling a deeper understanding of how DRL agents operate within network management systems. Researchers validated SymbXRL using practical network management use cases, demonstrating its ability to not only enhance the semantics of explanations but also to facilitate intent-based programmatic action steering. By formalising the agent’s behaviour with FOL, SymbXRL provides simple, logically structured explanations for understanding and comparing DRL agents, and identifies flaws in agent design through analysis of logical inconsistencies.
Experiments were conducted with DRL agents controlling Radio Access Network (RAN) slicing and scheduling on a next Generation Node B (gNB), alongside a DRL agent for resource scheduling in Massive MIMO, offering diverse decision-making contexts. The first agent featured a multi-modal action space impacting all Key Performance Indicators (KPIs), while the second had a discrete action space affecting a subset of KPIs, demonstrating the flexibility of SymbXRL across different scenarios. This research contributes a novel explainer for DRL agents using symbolic representations with FOL, validated through two diverse use cases in network slicing and Massive MIMO scheduling, and is supported by publicly released code and RL agent artifacts. Findings prove that SymbXRL provides human-readable and comprehensible symbolic explanations, improving upon state-of-the-art methods, and enables flexible intent-based action steering, resulting in a 12% median improvement in cumulative reward and outperforming existing XRL methods like METIS. The team anticipates that this work will stimulate further research in the field of explainable AI for 6G mobile networks and beyond.
Symbolic Reinforcement Learning via First-Order Logic enables compositional
This work addresses the critical barrier of interpretability hindering the adoption of DRL in production network settings. Researchers leveraged symbolic AI, specifically First-Order Logic (FOL), to formalize agent behaviour and decision-making processes, creating more intuitive and interpretable explanations than existing approaches. The team engineered a system employing FOL to represent the states and actions of DRL agents, enabling the creation of logically structured explanations for comparison and analysis. Experiments utilized two distinct DRL use cases addressing 5G and 6G challenges: Radio Access Network (RAN) slicing and scheduling on a next Generation Node B (gNB), and resource scheduling in Massive MIMO.
The RAN slicing agent operated within a multi-modal action space, encompassing both continuous and discrete factors impacting all Key Performance Indicators (KPIs), while the Massive MIMO agent featured a discrete action space affecting a subset of KPIs. This diversity demonstrated the flexibility of SymbXRL across varied scenarios and decision-making contexts. Scientists validated SymbXRL’s capacity for Intent-based Action Steering (IAS) through logical rules represented in FOL, allowing for explicit control over agent actions. The methodology involved training DRL agents to optimise network performance, then applying SymbXRL to extract symbolic representations of their decision-making processes.
These representations were then used to generate human-readable explanations and implement IAS policies, demonstrating both improved understanding and enhanced performance. Researchers released the code for both the DRL agents and SymbXRL on GitHub to promote reproducibility and further research in the field. This innovative technique offers a unique perspective on DRL agent behaviour, facilitating debugging, troubleshooting, and ultimately, wider adoption of DRL in future mobile networks.
SymbXRL unlocks intent-based control of DRL agents
The research addresses a critical barrier to DRL adoption in production settings, the ‘black box’ nature of trained agents. Validation in network management use cases demonstrates not only improved explanation semantics but also enables explicit control via intent-based programmatic action steering. The Explanation Engine (EE) utilises symbolic representations to generate insights into agent behaviour through probabilistic and Knowledge Graph (KG) analyses. Probabilistic analysis collects symbolic representations of states and actions, counting occurrences and calculating frequencies to visualise results via probability distributions and correlation density maps.
These analyses provide insights into input state distributions and the correlation between agent actions and their effects on the environment. KG analysis constructs a graph where nodes represent symbolic actions and edges represent transitions between them, with weights corresponding to action and transition frequencies. Applying these analyses to a toy example, the correlation density map revealed the agent maintains average throughput in Q3 by keeping transmission power in Q4. The KG showed frequent switching between mid and high frequency bands, represented as transitions between nodes. This approach offers direct insights into the agent’s decision-making process and reveals patterns not apparent from reward analysis alone.
Furthermore, the team introduced Intent-based Action Steering (IAS), integrating symbolic representation to guide agent behaviour towards specific network operator intents. Unlike methods that may reduce rewards, IAS selects actions from the agent’s prior experiences, ensuring both constraint satisfaction and performance optimisation. IAS operates on discretized state and action spaces, enabling efficient matching between current and past states and allowing operators to define intents using the same format as agent explanations. Tests prove that IAS can enhance cumulative reward by maximizing each step’s reward, achieving a∗ t = arg max a1. ,aT T X t=1 rt(st, at) s. t. at ∈A(st). Additionally, IAS allows for decision conditioning, such as scheduling specific users or avoiding scheduling users in a group, and accelerates learning by enabling agents trained for fewer episodes to achieve competitive cumulative rewards.
SymbXRL clarifies DRL decisions via symbolic AI
This approach clarifies the decision-making process of DRL agents in a way that is more readily understandable to human observers. Validation of SymbXRL involved practical network management scenarios, demonstrating improvements in both the clarity of explanations and agent performance. The authors acknowledge that existing XRL techniques, such as PIRL and EXPLORA, offer alternative approaches, but highlight that SymbXRL’s symbolic representation offers advantages in terms of efficiency and clarity. Future research could explore the application of SymbXRL to a wider range of complex systems and investigate methods for automatically generating the symbolic representations used in the explanations.
👉 More information
🗞 SymbXRL: Symbolic Explainable Deep Reinforcement Learning for Mobile Networks
🧠 ArXiv: https://arxiv.org/abs/2601.22024
