Future wireless networks, poised for the 6G era, demand increasingly flexible and efficient resource allocation, and researchers are now exploring how artificial intelligence can optimise radio access network (RAN) slicing to meet diverse service requirements. Md Arafat Habib, Medhat Elsayed, and Majid Bavand, along with colleagues from the University of Ottawa and Ericsson Inc., present a novel framework that integrates natural language understanding with intelligent decision-making for RAN slicing. Their work overcomes limitations in current approaches, which often rely on static configurations, by employing a system driven by Hierarchical Decision Mamba controllers and a Large Language Model. This innovative combination allows the network to interpret operator instructions and dynamically adjust resource allocation, resulting in significant improvements in throughput, cell-edge performance, and latency across multiple network slices, paving the way for more responsive and adaptable future wireless systems.
Adaptive RAN Slicing with Deep Reinforcement Learning
Radio Access Network (RAN) slicing allows multiple logical networks to operate on shared physical infrastructure by allocating resources to distinct service groups, with radio resource scheduling crucial for meeting slice-specific Service-Level Agreements (SLAs). Existing approaches using configuration-based or intent-driven Reinforcement Learning often struggle with dynamic wireless environments, leading to suboptimal resource allocation and potential SLA violations. This research introduces an adaptive radio resource scheduling framework leveraging deep reinforcement learning and a predictive model of channel state information, aiming to maximise overall network throughput while guaranteeing performance for each network slice, even with fluctuating traffic and channel conditions. The team developed a distributed learning architecture, enabling base stations to independently learn optimal scheduling policies through local observations and collaborative refinement via federated learning. This reduces computational burden and enhances scalability for large-scale RAN deployments. The method incorporates a novel state representation capturing slice-specific information, a tailored reward function balancing throughput and SLA satisfaction, and comprehensive performance evaluation demonstrating improvements in network efficiency and SLA compliance compared to conventional scheduling algorithms.
Agentic AI for Dynamic RAN Slicing
Researchers have pioneered a novel Agentic AI system for 6G Radio Access Network (RAN) slicing, designed to dynamically allocate resources and meet slice-specific Service-Level Agreements (SLAs). The system features a hierarchical control framework centred around a “super agent” powered by a fine-tuned Llama 3. This super agent coordinates inter-slice, intra-slice, and self-healing functions, surpassing the static mappings of existing Reinforcement Learning approaches. The system employs a Hybrid RAG (H-RAG) framework, integrating static knowledge from 3GPP standards and policy rules with dynamic, real-time Key Performance Indicators (KPIs) and slice performance data to ground decision-making.
The super agent receives an intent and generates an embedding to retrieve relevant context from both static and dynamic knowledge sources, supporting goal formulation that defines the target KPI, desired improvement margin, and slice of interest. The agent then selects and configures an agent to execute an action at global scheduling epochs, triggered by significant network events. To train the system, the team generated a dataset of state-goal-action tuples, using a hierarchical deep-Q-network to optimise QoS fulfillment and penalise violations. The HDM algorithm, built upon the Mamba neural network architecture, was trained offline to imitate this event-triggered orchestration policy, leveraging Mamba’s ability to handle long sequences efficiently. Unlike Decision Transformer methods, HDM learns directly from operator-defined goals, accelerating convergence and reducing memory overhead through a hierarchical structure that identifies significant past actions and predicts subsequent orchestration steps. This work addresses limitations in existing approaches by enabling a system that interprets operator intentions and translates them into effective resource allocation strategies, coordinating resources across and within network slices, while also incorporating self-healing capabilities. Experiments reveal that the HDM-enabled Agentic AI framework consistently outperforms three baseline methods across key performance indicators. Measurements demonstrate an approximately 3%, 11%, and 16% increase in eMBB aggregate throughput compared to Offline Decision Transformer, Hierarchical Reinforcement Learning, and a traditional rule-based scheduler, respectively.
Cell-edge performance, measured by the 5th-percentile throughput, shows improvements of 8%, 20%, and 34% over the same baselines, indicating enhanced fairness in resource distribution. Further tests confirm a substantial reduction in latency, with a 19% decrease compared to Decision Transformer, a 28% reduction relative to Hierarchical RL, and a 43% improvement over the rule-based approach at peak traffic loads. The system maintains URLLC latency below 2 milliseconds, even with increasing user equipment. The team developed an “Agentic AI” system where the language model interprets operator instructions and translates them into specific resource allocation goals, subsequently coordinated by the HDM controllers across and within network slices. This approach moves beyond static configurations, enabling a dynamic and responsive system capable of adapting to changing network demands and maintaining service quality. The results demonstrate consistent gains in key performance indicators, including increased throughput for enhanced Mobile Broadband, improved signal strength at cell edges, and reduced latency for Ultra-Reliable Low Latency Communications.
Importantly, the system exhibits faster response times and reduced computational demands compared to existing methods based on transformer architectures or traditional reward-driven reinforcement learning. The framework also incorporates a self-healing capability, autonomously restoring performance when quality of service degrades, a feature absent in conventional rule-based scheduling systems. The authors plan to validate these findings in real-world deployments and explore the system’s adaptability to even more complex network scenarios.
👉 More information
🗞 Hierarchical Decision Mamba Meets Agentic AI: A Novel Approach for RAN Slicing in 6G
🧠 ArXiv: https://arxiv.org/abs/2512.23502
