Quadrupedal robots present a promising solution for primary searches within the dangerous early stages of indoor fires. Baixiao Huang, Baiyu Huang, and Yu Hou demonstrate a novel approach to improving the ability of these robots to navigate and ascend stairs, a critical capability for effective victim location. Their research details a two-stage deep reinforcement learning framework, utilising the Unitree Go2 quadruped, that successfully transfers stair-climbing skills learned in simulated pyramid terrain to a variety of realistic indoor staircase configurations, including straight, L-shaped, and spiral designs. This work is significant because it addresses key challenges in robot-assisted search and rescue, specifically balancing navigation with locomotion and enabling adaptation to complex, real-world environments using only local perception, paving the way for more robust and autonomous firefighting robots.
Optimising quadruped robot navigation and stair ascent using deep reinforcement learning requires robust sim-to-real transfer
Scientists are deploying quadruped robots, often called robot dogs, to enhance primary search operations during the initial stages of indoor fires. A typical primary search demands swift and thorough victim location amidst hazardous conditions, alongside continuous monitoring of flammable materials. However, achieving robust situational awareness within complex indoor environments and enabling rapid stair climbing across diverse staircase configurations represent significant challenges for robot-assisted searches.
This work introduces a novel two-stage end-to-end deep reinforcement learning (RL) approach designed to simultaneously optimise both navigation and locomotion for these quadrupeds. Initially, the Unitree Go2 quadruped was trained to ascend stairs within the pyramid-stair terrain of NVIDIA’s Isaac Lab.
Subsequently, the same robot was trained to navigate a variety of realistic indoor staircases, straight, L-shaped, and spiral, also within the Isaac Lab engine, leveraging the policy learned during the first stage. This innovative framework facilitates the transfer of stair-climbing skills from simplified, abstract terrains to complex, real-world indoor environments.
The research specifically addresses the critical balance between navigation and locomotion, demonstrating how end-to-end RL methods can empower quadrupeds to adapt to varying stair geometries. A key contribution of this study is a centerline-based navigation formulation, which unifies the learning of navigation and locomotion without requiring complex hierarchical planning.
Through local height-map perception alone, the developed policy demonstrates successful generalisation across diverse staircase designs. Empirical analysis rigorously assesses the system’s success rate, efficiency, and identifies failure modes as stair difficulty increases, providing valuable insights into the limitations and potential improvements of the approach. This work represents a substantial step towards deploying autonomous, agile robots for improved safety and effectiveness in critical firefighting scenarios.
Quadruped stair-climbing via unified navigation and reinforcement learning in simulated environments demonstrates robust and adaptable locomotion
A two-stage end-to-end deep reinforcement learning (RL) framework underpinned the development of enhanced stair-climbing capabilities for quadruped robots. Unitree Go2 quadrupeds served as the robotic platform for this research, undergoing initial training within the simulated pyramid-stair terrain of NVIDIA’s Isaac Lab.
This first stage focused on establishing foundational stair-climbing skills before progressing to more complex scenarios. Subsequently, the learned policy was transferred to a second training stage involving realistic indoor staircases also constructed within the Isaac Lab engine. These indoor staircases encompassed three distinct configurations, straight, L-shaped, and spiral, representing the architectural diversity encountered during primary search operations.
The research specifically addressed the challenge of unifying navigation and locomotion through a centerline-based navigation formulation, eliminating the need for hierarchical planning. This approach allowed for simultaneous optimisation of path planning and gait control, improving the robot’s overall efficiency and adaptability.
Local height-map perception provided the sensory input for the RL algorithms, enabling the quadrupeds to navigate and climb stairs without relying on global maps or extensive prior knowledge. Policy generalization was demonstrated across the diverse staircase types, highlighting the robustness of the learned skills.
Empirical analysis then assessed the success rate, efficiency, and identified failure modes as stair difficulty increased, providing valuable insights into the limitations and potential improvements of the system. This detailed evaluation facilitated a comprehensive understanding of the robot’s performance under varying conditions.
Staircase geometry significantly impacts robotic quadruped ascent performance and stability
Success rates in stair climbing tasks varied considerably with staircase geometry and difficulty level. Across straight, L-shaped, and spiral staircases, success rates declined as the difficulty level increased from 1 to 6. Specifically, for straight stairs, success rates decreased from a high value at level 1 to a lower value at level 6.
L-shaped stairs exhibited a more pronounced decline in success rate with increasing levels, and spiral stairs demonstrated the greatest reduction in successful climbs. A notable decrease in success rate occurred between levels 5 and 6 across all staircase types, potentially due to the quadrupeds hesitating to complete the climb at the highest difficulty.
Mean linear velocity generally decreased as the difficulty level increased, indicating a slower overall ascent speed. Conversely, climb rate exhibited an increasing trend with level, suggesting that the robots adapted to maintain progress despite increasing complexity. Position errors, measured as the distance between the robot’s position and the commanded position, increased with level for all stair types.
The smallest position errors were observed on straight stairs, while L-shaped and spiral stairs presented significantly higher errors, particularly between levels 5 and 6. Heading errors, representing the angular difference between the robot’s yaw and the commanded yaw, also increased with level. Straight stairs again yielded the lowest heading errors, with substantial increases observed in L-shaped and spiral staircases at higher levels.
Mean power output, averaged across the 12 motors, increased with staircase complexity and difficulty. However, a slight decrease in power consumption was noted at level 6, potentially reflecting a more conservative approach to avoid falls or collisions. Comparing the performance of a model trained solely on pyramid terrain (stage 1) with models trained on specific stair types (stage 2) revealed significant improvements from the second-stage training. The stage 1 model achieved the highest success rate on straight terrain, but performed poorly on L-shaped stairs, often attempting to move directly towards the goal and colliding with the structure.
Simultaneous optimisation of navigation and stair ascent for quadrupedal robots presents significant challenges in motion planning and control
Quadruped robots are increasingly valuable for initial searches within burning buildings, requiring efficient navigation and monitoring of potentially flammable materials. Successfully deploying these robots necessitates overcoming challenges related to situational awareness in complex indoor spaces and the ability to ascend stairs quickly and reliably.
Researchers developed a two-stage deep reinforcement learning approach to simultaneously optimise both the navigation and locomotion of quadrupedal robots, specifically the Unitree Go2 model. The initial training phase focused on stair climbing within a simplified, pyramid-like terrain, followed by a second phase utilising more realistic indoor staircase models, including straight, L-shaped, and spiral designs.
This work demonstrates successful transfer of learned stair-climbing skills from simulated abstract environments to complex indoor scenarios. A key aspect of the methodology is a centerline-based navigation formulation, which allows for the unified learning of navigation and locomotion without requiring complex hierarchical planning.
The system achieved a high success rate, agility, and minimal positional errors across various staircase difficulties, utilising only local height-map perception for environmental understanding. The authors acknowledge a current limitation in testing only ascending scenarios and an assumption of intact stair structures.
Future work will focus on extending the algorithms to handle both ascending and descending movements, as well as testing the robot’s performance in more challenging and damaged environments. These findings establish a clear path toward improved robot-assisted primary searches in indoor fires. By effectively integrating navigation and locomotion through end-to-end reinforcement learning, the system demonstrates the potential for quadrupeds to adapt to diverse and complex stair geometries. This advancement contributes to the growing field of robotic assistance for firefighters, enhancing their ability to quickly and safely assess hazardous situations and locate potential victims.
👉 More information
🗞 Training and Simulation of Quadrupedal Robot in Adaptive Stair Climbing for Indoor Firefighting: An End-to-End Reinforcement Learning Approach
🧠 ArXiv: https://arxiv.org/abs/2602.03087
