Researchers are tackling the challenge of creating truly adaptive control systems for future mobile networks, but current deep reinforcement learning (DRL) approaches often react to events rather than anticipating them. MohammadErfan Jabbari, Abhishek Duttagupta, and Claudio Fiandrino, from IMDEA Networks Institute, alongside Leonardo Bonati, Salvatore D’Oro, and Michele Polese from Northeastern University, present a significant step forward with their new Symbolic Interpretability for Anticipatory (SIA) framework. This work is notable because it offers the first method for understanding how predictions influence DRL decisions in network control, addressing a critical barrier to adoption by providing transparency and enabling targeted improvements to policy design; their evaluations demonstrate a 9% bitrate increase in video streaming and a 25% reward improvement in RAN-slicing through SIA’s insights.
Interpretable Forecast Use in Reinforcement Learning
Conventional DRL agents react to current and past measurements, lacking the capacity to leverage short-term forecasts of key performance indicators like bandwidth. While augmenting agents with predictions can overcome this temporal myopia, these forecast-aware agents often operate as “black boxes”, leaving network operators unable to determine if predictions genuinely guide decisions or simply add unnecessary complexity. SIA uniquely fuses Symbolic AI abstractions with per-KPI Knowledge Graphs to generate explanations, and introduces a novel Influence Score metric. This metric quantifies the impact of both current observations and predicted future trends on agent behaviour.
Crucially, SIA operates at sub-millisecond speed, exceeding the performance of existing Explainable AI (XAI) methods by over 200x. The team’s approach addresses the challenge of understanding how agents utilise forecasts, answering questions about the extent to which predictions influence decisions, the prediction horizons agents employ, and how forecasts alter overall strategies. This level of insight is essential for aligning agent behaviour with the specific goals of network operators. Experiments demonstrate that agents utilising future network bandwidth estimates achieve higher Quality of Experience (QoE) by acting proactively, reducing rebuffering time by 53% in adaptive bitrate streaming.
The research establishes a new method for disentangling the influence of current network states from predicted future trends, using a symbolic framework and scalable knowledge graphs. This allows for a detailed understanding of the agent’s decision-making process, identifying potential biases and areas for optimisation. By providing a computationally efficient and interpretable solution, SIA paves the way for more robust and adaptable mobile networks capable of anticipating and responding to dynamic conditions.
SIA framework for real-time DRL explanation provides actionable
Scientists developed SIA, a novel framework designed to reveal how Deep reinforcement learning agents exploit forecasts in mobile networks. The study pioneered a method fusing Symbolic AI abstractions with per-KPI Knowledge Graphs to produce real-time explanations of agent behaviour. Researchers engineered SIA to operate at sub-millisecond speed, achieving performance over 200times faster than existing Explainable Artificial Intelligence methods. This speed was critical for online analysis and intervention within dynamic network environments. The research team implemented a system that disentangles the influence of current observations from the impact of future predictions, addressing a key diagnostic gap in existing XAI techniques.
Specifically, the team analysed how agents integrated forecasted bandwidth data with current network conditions to optimise performance. This involved constructing per-KPI Knowledge Graphs that represented relationships between controllable and exogenous KPIs, enabling the system to trace the causal pathways of agent decisions. Scientists harnessed a new Influence Score metric within SIA to quantify the contribution of each forecast to the agent’s actions. The methodology involved calculating the change in agent policy when a specific forecast was removed, providing a direct measure of its impact.
The work details how SIA’s symbolic abstractions translate complex neural network outputs into human-interpretable rules, facilitating debugging and optimisation. Researchers validated the system’s performance by comparing the Quality of Experience, specifically rebuffering time, between reactive and proactive agents, demonstrating a 53% reduction in rebuffering with forecast integration. This improvement highlights the effectiveness of SIA in enabling anticipatory DRL and lowering the barrier to proactive control strategies.
SIA reveals forecast-augmented DRL agent behaviour through interpretable
Conventional DRL systems react to past and current data, but SIA augments this with short-term forecasts, addressing a limitation known as temporal myopia. Tests confirm SIA achieves sub-millisecond speed, exceeding the performance of current explainable AI (XAI) methods by over 200x. Specifically, the team measured temporal misalignment where forecasts and reactive data were combined with inconsistent time horizons. Data shows the proactive agent (A1-P) receives bandwidth forecasts structured around a 4-step horizon, while throughput data lacks a corresponding forecast component. This inconsistency led to the agent incorrectly applying predictive heuristics, impacting performance.
Policy graphs revealed the proactive agent (A1-P) maintains a higher steady-state quality, spending over 58% of its time at 1200 kbps, compared to the reactive agent (A1-R) which operates at 750 kbps for 43% of the time. Measurements confirm A1-P achieves a 1.7% higher reward and a 1.8% higher bitrate compared to the reactive baseline (A1-R), with a p-value of less than 0.01. Analysis of the MIMO scheduling agent (A2-R) revealed a counterintuitive policy where allocating 100% of resources to the best users yielded a lower reward (0.73) than a partial allocation (0.80 for an 85% allocation), a flaw exposed by SIA’s policy graphs. Mutual Information (MI) analysis quantified the impact of forecasts, revealing that the agent learned to prioritize values near the forecast horizon (t=1 and t=2) for bandwidth, incorrectly applying this to throughput data. The team recorded that multiple user groups were present in only 17% of timesteps, highlighting the critical moments where the agent diverted resources. SIA’s bounded policy graphs, unlike those produced by methods like Metis (3200, 5000 nodes), offer a real-time, interpretable visualization of the agent’s temporal action policies.
SIA reveals DRL agent behaviours and biases
The authors acknowledge that the number of categories used within SIA is a critical hyperparameter, with fewer than three yielding overly generic insights and more than seven prolonging the initial learning phase. This research demonstrates a paradigm shift in interpreting anticipatory DRL, offering transparency and trust in mobile network control. By providing real-time, actionable insights, SIA overcomes limitations of existing methods and enables both agent redesign and automated performance enhancements.
👉 More information
🗞 SIA: Symbolic Interpretability for Anticipatory Deep Reinforcement Learning in Network Control
🧠 ArXiv: https://arxiv.org/abs/2601.22044
