AI Sessions Enable QoS and Control Latency

Researchers are addressing the growing need for reliable and efficient cloud-based artificial intelligence inference, a challenge exacerbated by increasing latency demands and context sensitivity. Merve Saimler from Ericsson Research, Türkiye, and Mohaned Chraiti from Sabancı University, Türkiye, present a novel approach called Network-Exposed AI-as-a-Service (NE-AIaaS), built around a new ‘AI Session’ primitive. This work is significant because it moves beyond current ‘best-effort’ network transport for AI services, enabling enforceable latency guarantees, intelligent admission control, and seamless operation during user mobility. The authors detail protocol-grade procedures for discovering models, selecting execution anchors, co-reserving resources, and ensuring session continuity, all designed to integrate with existing standards like Common API Framework, ETSI Multi-access Edge computing, 5G QoS, and Network Data Analytics Function architectures.

Artificial intelligence is rapidly moving beyond fixed locations, demanding a network that can keep pace with its needs. Current cloud services treat AI as just another application, failing to guarantee consistent performance or seamless transitions as users move. A fresh architectural approach promises to deliver reliable, adaptable AI experiences by directly integrating network resources with AI processing.

Researchers are redefining how artificial intelligence (AI) services operate within networks, moving beyond a simple ‘best-effort’ delivery model towards guaranteed performance and seamless mobility. Current cloud-based AI inference often suffers from unpredictable delays and lacks the ability to adapt to changing network conditions, hindering real-time applications.

This work introduces Network-Exposed AI-as-a-Service (NE-AIaaS), a system designed to bind AI model execution directly to network resources, ensuring predictable latency and uninterrupted service even as users move between locations. At its core lies the concept of an ‘AI Session’ , a contractual agreement that combines model identity, compute location, network quality-of-service, and usage parameters into a single, manageable unit.

This approach fundamentally alters how AI is delivered, shifting from an application requesting a service to the network actively managing and guaranteeing that service. The team developed the AI Service Profile, a concise description of the AI task’s requirements, including acceptable response times, latency targets, and even privacy constraints. These profiles enable a series of automated procedures, discovery, anchoring, preparation, and migration, that coordinate compute and network resources.

Rather than relying on applications to negotiate these details, the network itself takes responsibility for fulfilling the agreed-upon service level. The design is deliberately aligned with existing standards like Common API Framework and 5G QoS flows, paving the way for practical implementation within current and future networks. Beyond improving performance, NE-AIaaS addresses a critical gap in current AI service delivery: the ability to maintain a consistent experience during user movement.

Traditional systems often require re-establishing connections and reloading models, causing noticeable interruptions. The proposed ‘make-before-break’ migration technique ensures a smooth handover, seamlessly transferring the AI session to a new compute location without service disruption. This is achieved through a two-phase process that reserves resources at the destination before releasing them at the source, guaranteeing continuity.

The research demonstrates a reliable, predictable, and mobile AI experience, essential for applications ranging from real-time language translation to autonomous vehicle control. Simply achieving low latency is insufficient; the system must also guarantee service continuity under challenging conditions. The team’s design prioritizes explicit failure semantics, meaning that any deviation from the agreed-upon service level is clearly defined and detectable.

This allows for proactive intervention and ensures accountability. By framing AI delivery as a network-native capability, the researchers envision a future where service providers can offer programmable AI services with enforceable guarantees, unlocking new revenue streams and enabling a wider range of innovative applications. The work establishes a foundation for a more intelligent and responsive network, capable of adapting to the demands of increasingly sophisticated AI workloads.

Reduced Tail Latency and Improved Service Level Agreements via Network-Exposed AI inference

Network-Exposed AI-as-a-Service (NE-AIaaS) demonstrably reduces tail latency and improves service reliability, as evidenced by simulation results. At an offered load of 0.9, the 99th-percentile end-to-end latency for the endpoint baseline reached 0.6 milliseconds, exhibiting substantial tail blow-up. Conversely, NE-AIaaS maintained a significantly lower tail latency of 0.2 milliseconds at the same load, indicating the network’s active role in selecting admissible execution options and provisioning Quality of Service.

This difference highlights a key achievement in managing latency-sensitive AI inference. Examining ASP violation probability, the endpoint AIaaS showed a sharp increase near saturation, with violations rising to 0.5 at a load of 0.9. This stems from server queueing dominating delay, exceeding both the 99th percentile latency and maximum tolerable time.

Yet, NE-AIaaS achieved a markedly lower violation probability of 0.1 at the same load, benefiting from reserved QoS and compute-aware admission control. These results confirm that binding model identity, execution placement, and QoS into a single lifecycle improves performance under pressure. Further analysis focused on mobility, evaluating interruption probability versus user speed.

A teardown/re-establish handover mechanism induced a rapidly increasing interruption probability, reaching 0.8 at a user speed of 20 metres per second. However, the make-before-break continuity offered by NE-AIaaS kept interruption probability consistently close to zero across the entire speed range tested. This preservation of session continuity is a direct result of proactive session migration, avoiding service disruption during handover.

Simulation computed violation probability over admitted sessions only, aligning with the defined session semantics. Once the system was configured, the research team measured interruption probability within a fixed session window, sweeping user speed to assess handover performance. Beyond latency and reliability, the work demonstrates a path toward interoperable, provider-grade AI services that are measurable, diagnosable, and enforceable through standardized AI Service Profiles and AI Sessions.

A compact AI Service Profile (ASP) forms the basis of this work, expressing task modality and measurable service objectives alongside privacy and mobility constraints. This profile isn’t merely a description; it’s a contractual object binding model identity, execution placement, transport Quality-of-Service (QoS), and consent scope into a unified lifecycle with defined failure behaviours.

Consequently, the research centres on establishing procedures for discovering suitable models, selecting execution anchors based on context, and co-reserving both compute and network resources through a two-phase prepare/commit process. These procedures are designed to be standard-mappable, aligning with the Common API Framework (CAPIF) for northbound exposure.

Enforceable AI performance through network resource binding

Scientists are proposing a fundamental shift in how artificial intelligence operates within communication networks. For years, AI services have been treated as isolated applications, demanding bandwidth but offering little predictability in return. This approach feels akin to shouting into the void, hoping for a timely response, a situation increasingly unacceptable as AI becomes interwoven with real-time applications.

Now, a new design called Network-Exposed AI-as-a-Service (NE-AIaaS) seeks to bind AI processing directly to network resources, guaranteeing performance and reliability. The real innovation isn’t simply about speed; it’s about control. By defining ‘AI Sessions’ , contractual agreements between the AI, the network, and even the user, this system allows for enforceable quality of service.

Imagine a self-driving car negotiating a guaranteed latency for object recognition, or a surgeon receiving a consistent response time for AI-assisted diagnostics. Such guarantees have been elusive, hampered by the inherent unpredictability of shared network infrastructure.

Translating this concept into widespread deployment presents challenges. Beyond the technical hurdles, questions of data privacy and consent within these ‘AI Sessions’ require careful consideration and robust governance.

At present, the focus appears to be on the technical architecture, with less detail on the commercial and legal frameworks needed for broad adoption. However, this work signals a broader trend, a move towards intelligent networks that actively manage AI workloads. Beyond this specific implementation, we can anticipate a future where networks aren’t simply conduits for data, but active participants in the AI process itself.

For telecommunications companies, this represents an opportunity to move beyond being mere bit-carriers and become value-added service providers, offering guaranteed AI performance as a premium offering. Ultimately, the success of NE-AIaaS, or similar approaches, will depend on establishing trust and transparency in how AI interacts with the networks that underpin our increasingly connected world.

👉 More information
🗞 AI Sessions for Network-Exposed AI-as-a-Service
🧠 ArXiv: https://arxiv.org/abs/2602.15288
Muhammad Rohail T.

Latest Posts by Muhammad Rohail T.: