Sixth-generation (6G) radio access networks require robust mechanisms to guarantee service-level agreements for diverse network slices, but resolving unpredictable latency spikes presents a significant challenge for current reinforcement learning and explainable reinforcement learning techniques. Kavan Fatehi, Mostafa Rahmani Ghourtani, and Hamed Ahmadi from the University of York, working with Amir Sonee and Alessandra M Russo from Imperial College London, alongside Poonam Yadav and Radu Calinescu from the University of York, address this issue by introducing Attention-Enhanced Multi-Agent Proximal Policy Optimization (AE-MAPPO). This novel framework integrates six specialised attention mechanisms into multi-agent slice control, providing clear and reliable explanations for its actions. The research demonstrates AE-MAPPO’s ability to not only resolve latency spikes within milliseconds and maintain reliable performance, but also to substantially reduce troubleshooting time while ensuring continued service for existing enhanced mobile broadband and massive machine type communication applications, representing a crucial step towards trustworthy and automated 6G RAN slicing.
Unlike conventional deep reinforcement learning (DRL) or explainable RL (XRL) approaches, AE-MAPPO integrates six specialised attention mechanisms directly into its decision-making process, offering inherent interpretability without incurring additional computational costs.
The framework operates across the timescales defined by O-RAN specifications, employing a three-phase strategy encompassing predictive analysis, reactive response, and inter-slice optimisation. A key achievement of this work is the ability to not only restore latency to 0.98ms with 99.9999% reliability but also to dramatically reduce troubleshooting time by 93%, all while ensuring the continued operation of diverse services such as enhanced mobile broadband (eMBB) and massive machine-type communications (mMTC).
This combination of performance and transparency represents a substantial step towards trustworthy and real-time automation in 6G RAN slicing. AE-MAPPO’s attention mechanisms highlight the causes of latency spikes, including buffer overflows, recurring patterns in network traffic, and interference between slices.
By surfacing these insights, the system empowers network operators to understand and proactively address potential issues, moving beyond simply reacting to problems as they arise. The research demonstrates that AE-MAPPO can resolve a latency spike in 18 milliseconds, a critical timeframe for applications demanding ultra-reliable low-latency communication (URLLC). Following spike resolution, latency is restored to 0.98 milliseconds with a reliability of 99.9999 percent. This performance represents a substantial improvement in maintaining ultra-reliable low-latency communication (URLLC) service levels.
Furthermore, the research indicates a 93 percent reduction in troubleshooting time, streamlining network management and minimising service disruption. The AE-MAPPO framework achieves these results through the integration of six specialised attention mechanisms within a multi-agent slice control system.
These attention mechanisms function as zero-cost explanations, providing insight into the decision-making process without compromising performance. Semantic attention identifies critical states such as buffer overflows, while temporal attention detects recurring patterns over time. Cross-slice attention captures interference between network slices, and counterfactual attention evaluates alternative actions to optimise performance.
The study operates across O-RAN timescales, employing a three-phase strategy encompassing predictive, reactive, and inter-slice optimisation. This allows the system to anticipate potential issues, respond to immediate disruptions, and proactively adjust resource allocation between slices. The framework’s ability to combine stringent SLA compliance with inherent interpretability is central to its success, enabling trustworthy and real-time automation for future 6G RAN slicing deployments. This framework employs six specialised attention mechanisms integrated directly into the decision-making process, rather than applying explanation methods after a decision has been made.
Semantic attention identifies critical network states, such as buffer overflows, while temporal attention detects recurring patterns in network behaviour over time. Cross-slice attention specifically quantifies interference between different network slices, allowing the system to anticipate and mitigate conflicts.
Complementing these, confidence attention assesses the certainty of the agent’s predictions, counterfactual attention evaluates the potential outcomes of alternative actions, and a meta-controller orchestrates the other attention modules. These attention heads operate concurrently during inference, providing zero-cost, faithful explanations alongside control actions.
The choice of PPO, a policy gradient method, allows for stable and efficient learning in complex, continuous action spaces typical of RAN environments. The research implements a three-phase strategy, predictive, reactive, and inter-slice optimisation, to address service-level agreement (SLA) violations.
Initially, the predictive phase anticipates potential latency spikes using historical data and current network conditions. Subsequently, the reactive phase swiftly responds to detected spikes, adjusting resource allocation to restore performance. Finally, the inter-slice optimisation phase proactively balances resources across slices to prevent future issues and maintain overall network stability. This phased approach, combined with the attention mechanisms, enables AE-MAPPO to not only resolve latency spikes, measured in milliseconds, but also to provide operators with actionable insights into the causes and potential solutions.
The Bigger Picture
The relentless pursuit of lower latency in wireless networks has often felt like chasing a receding horizon. While incremental gains are common, truly disruptive improvements, particularly those guaranteeing performance alongside transparency, have proved elusive. This new work offers a compelling step forward, not simply by achieving faster response times in simulated 6G networks, but by demonstrably explaining how those improvements are made.
The ability to diagnose and rectify latency spikes within milliseconds, while simultaneously maintaining service for other demanding applications, is significant. For years, the promise of network slicing, dedicating specific portions of the network to particular services, has been hampered by the complexity of real-time management.
Existing approaches relying on reinforcement learning, while powerful, often function as ‘black boxes’, making it difficult for human operators to trust or intervene effectively. This research tackles that head-on, integrating attention mechanisms to surface the reasoning behind automated decisions. This isn’t just about faster networks; it’s about building confidence in those networks, particularly for critical applications like industrial automation or remote surgery.
However, scaling these results beyond the controlled environment of a case study remains a substantial challenge. The framework’s performance in a large, densely populated cellular network, with its inherent interference and unpredictable traffic patterns, is yet to be proven. Furthermore, the energy consumption of these attention mechanisms, while not explicitly addressed, could become a limiting factor. Future work rightly points towards federated and semantic learning, suggesting a move towards more distributed and intelligent network management, but the integration of these concepts will be crucial to unlocking the full potential of this approach.
👉 More information
🗞 Interpretable Attention-Based Multi-Agent PPO for Latency Spike Resolution in 6G RAN Slicing
🧠 ArXiv: https://arxiv.org/abs/2602.11076
