Reinforcement Learning Scheduler Cuts Kubernetes CPU Usage by 20%

Researchers are tackling the challenge of optimising resource allocation in Kubernetes, the dominant platform for container orchestration, particularly for demanding compute workloads. Hanlin Zhou, Huah Yong Chan, and Shun Yao Zhang, from the School of Computer Sciences, Universiti Sains Malaysia and the Xiamen Institute of Software Technology, alongside Meie Lin, Jingfei Ni et al, present a novel approach utilising reinforcement learning to intelligently schedule ‘pods’ , the fundamental unit of deployment. Their work demonstrates significant improvements over the default Kubernetes scheduler, reducing average CPU utilisation per cluster node by up to 20% with their SDQN-n model, and paving the way for more efficient and sustainable data centres , a crucial step as energy consumption in cloud computing continues to rise.

Researchers selected average per-node CPU utilisation as the key performance metric, recognising its direct impact on CPU provisioning decisions for both cloud and on-premises infrastructure, as well as its influence on co-located service performance and power consumption. This work establishes a new benchmark for pod scheduling, demonstrating the potential of reinforcement learning to optimise resource allocation in dynamic containerised environments. Furthermore, the SDQN-n strategy’s innovative approach to pod consolidation significantly reduces overall CPU usage, enabling the potential shutdown of idle machines and promoting sustainable data centre operations.

The team’s contributions include the introduction of the SDQN framework, seamlessly integrating the DQN reinforcement-learning paradigm with Kubernetes’s scheduling pipeline, and the development of SDQN-n, which leverages reinforcement learning for intelligent pod consolidation. The researchers highlight the adaptability of their architectures, noting that the reinforcement-learning components can be easily tuned to accommodate the requirements of diverse future scenarios. This flexibility ensures the long-term viability and applicability of the proposed schedulers in evolving cloud computing landscapes. This research opens avenues for substantial improvements in resource management within Kubernetes clusters, offering a pathway to reduce operational costs, enhance performance, and minimise environmental impact.
By demonstrating superior resource savings compared to both default and alternative AI-driven approaches, the study underscores the potential of reinforcement-learning-driven scheduling to revolutionise container orchestration. The findings have direct implications for organisations deploying compute-intensive workloads in cloud or on-premises environments, providing a viable solution for optimising resource utilisation and building more sustainable infrastructure. The team’s work promises to advance the field of cloud computing by enabling more efficient and environmentally responsible data centres.

SDQN and SDQN-n Kubernetes Scheduler Development

To facilitate this, the study harnessed six key input parameters: CPU Usage Percentage, Memory Usage Percentage, Pod Utilization, Health Status, Node Uptime (hours), and Number of Running Pods, each calculated using specific formulas detailed in the work. CPU and memory usage were determined as ratios of real-time consumption to total capacity, while pod utilization reflected the current workload pressure on each node as a percentage of the maximum possible pods. Node health was assessed with a binary indicator, 1 for “Ready” status, 0 otherwise, and uptime was measured in hours from the node’s start time. This comprehensive input set enabled the models to accurately assess node conditions and make informed scheduling decisions.

The SDQN algorithm employs a neural network to approximate the Q-function, estimating the optimal action for each state-action pair. The team defined a reward function, detailed in Table 3 of the published work, designed to maintain CPU and memory utilisation within optimal ranges, rewarding utilisation between 40-70% with +10 points, penalising above 70% with -2 points per 1% over the threshold, and assigning -10 points for values below 40%. Pod distribution and node uptime also contributed to the reward score, incentivising workload spread and stable node operation. The SDQN model itself consists of a 6-dimensional input layer, a single fully connected hidden layer mapping to 32 dimensions with ReLU activation, and a final fully connected output layer estimating the Q-value.

Further innovation came with SDQN-n, which builds upon SDQN by enforcing pod placement across a limited number of nodes, specifically, two, to amplify resource savings and promote energy efficiency. This constraint is reflected in the modified reward function, penalising placement outside the top two candidate nodes with -50 points, encouraging consolidation. The training process for both models involved forward propagation to compute Q(s,a), followed by backpropagation using target rewards to update network weights, utilising the Adam optimiser with a learning rate of 0.001.

SDQN schedulers improve Kubernetes CPU resource

This consolidation strategy concentrates pods onto fewer nodes, maximising resource utilisation and minimising waste. Data shows that this approach not only lowers CPU load but also paves the way for more energy-efficient data centres by enabling the decommissioning of idle machines. Scientists recorded that the SDQN and SDQN-n architectures’ reinforcement-learning components are readily tunable, allowing adaptation to diverse future scenarios and workload requirements. Measurements confirm that the key performance metric, average per-node CPU utilisation, was consistently lower with the new schedulers, directly impacting CPU provisioning decisions for both cloud and on-premises servers.

The breakthrough delivers substantial resource savings, potentially reducing power consumption and improving the scalability of containerised applications. Tests prove that SDQN-n’s pod consolidation strategy is particularly effective, achieving a greater than 20% reduction in CPU usage by strategically placing compute-intensive pods. This work establishes a foundation for greener, more sustainable data centre operations and improved resource management in cloud computing environments.

SDQN-n lowers Kubernetes CPU usage significantly, improving cluster

The effectiveness of SDQN stems from reinforcement learning’s capacity to adapt to the real-time state of each node, strategically placing pods to minimise overall CPU usage. The authors acknowledge that further work is needed to broaden the models’ applicability across diverse workload types and cluster configurations. Future research will also focus on refining hyperparameters to enhance resource savings and scheduling robustness, and investigating the SDQN-n consolidation strategy as a blueprint for energy-efficient data centres.

👉 More information
🗞 A Kubernetes custom scheduler based on reinforcement learning for compute-intensive pods
🧠 ArXiv: https://arxiv.org/abs/2601.13579

Rohail T.

Rohail T.

As a quantum scientist exploring the frontiers of physics and technology. My work focuses on uncovering how quantum mechanics, computing, and emerging technologies are transforming our understanding of reality. I share research-driven insights that make complex ideas in quantum science clear, engaging, and relevant to the modern world.

Latest Posts by Rohail T.:

Non-Commutative Schwarzschild Black Hole Achieves Particle Creation Estimates Using Tunneling

Non-Commutative Schwarzschild Black Hole Achieves Particle Creation Estimates Using Tunneling

January 22, 2026
Gravitational Waves Constrain F(R) Gravity & Black Hole Entropy Area Formula

Gravitational Waves Constrain F(R) Gravity & Black Hole Entropy Area Formula

January 22, 2026
Gravitational Waves Constrain F(R) Gravity with Inverse Area Corrections to Entropy

Gravitational Waves Constrain F(R) Gravity with Inverse Area Corrections to Entropy

January 22, 2026