Researchers are tackling the challenge of enabling humanoid robots to navigate platforms exceeding their leg length, a significant hurdle in current locomotion research. Yikai Wang, Tingxuan Leng, and Changyi Lin from Carnegie Mellon University, working with Shir Simon and Bingqing Chen from the Bosch Center for Artificial Intelligence, alongside Jonathan Francis (Carnegie Mellon University and Bosch Center for Artificial Intelligence) and Ding Zhao from Carnegie Mellon University, present APEX, a novel system for perceptive, climbing-based high-platform traversal. This work is significant because it moves beyond jumping-based solutions, instead focusing on contact-rich climbing behaviours and a progress-based reward system to achieve stable and safe navigation. Through LiDAR-based policies and a strategy to bridge the simulation-to-reality gap, the team demonstrates successful zero-shot traversal of 0.8 metre platforms, over 114% of the robot’s leg length, on a 29-DoF Unitree G1 humanoid, representing a substantial step towards more versatile and practical robotic mobility.

Scientists have developed a robotic system, APEX, capable of autonomously traversing platforms up to 0.8 metres in height, approximately 114% of its leg length, overcoming a longstanding limitation in humanoid locomotion. This achievement enables a humanoid robot to confidently navigate and overcome obstacles significantly taller than itself, opening possibilities for deployment in more complex and realistic environments. APEX combines perceptive abilities with a repertoire of coordinated, full-body maneuvers, including climbing up and down vertical edges, walking or crawling across surfaces, and dynamically reconfiguring posture between standing and lying positions. Central to the system’s success is a novel “ratchet progress reward” that encourages efficient learning of contact-rich movements, prioritizing consistent progress towards goals while maintaining safety and stability. This approach avoids the pitfalls of traditional reinforcement learning methods, which often lead to unstable or impractical solutions. Researchers addressed the challenge of transferring skills learned in simulation to the real world through a dual strategy, modelling potential mapping inaccuracies during training and refining elevation maps during deployment, minimising the perception gap between the virtual and physical environments. The system consolidates six individual skills, climb-up, climb-down, walking, crawling, stand-up, and lie-down, into a single, unified policy, allowing the robot to autonomously select the most appropriate behaviour based on its surroundings and user commands. Experiments using a 29-DoF Unitree G1 humanoid robot demonstrate robust, zero-shot sim-to-real performance, with smooth transitions between skills and adaptation to varying platform heights and initial robot poses. Success rates reached 99.9% for the climb-down maneuver, accompanied by an average computation time of 754 ±241 milliseconds per attempt. Stand-up and lie-down skills both achieved perfect 100% success rates, requiring 632 ±222 and 576 ±125 milliseconds respectively. These near-flawless performances across individual full-body maneuvers validate the effectiveness of contact-force regularization during training and demonstrate robust skill acquisition. Evaluation of the climb-up policy across varying platform heights, from 0.6 to 0.8 metres, and approach angles ranging from −45° to 45° consistently yielded high success rates. The robot adapted its whole-body strategy, reorienting its torso and utilising full-body motion to initiate climbing, even when tested with more extreme approach angles up to ±65°, outside the original training distribution. This adaptability is further demonstrated by successful climbing onto a platform covered with a soft vinyl-foam mat, introducing unseen compliance and friction properties. The distilled policy exhibits context-aware motor strategies, dynamically switching to climb-up when approaching the platform and selecting different lead legs based on the approach angle. During a disturbance test, the robot recovered from a heavy kick while approaching the platform, adjusting its gait and pivoting leg to maintain balance and initiate climbing, highlighting the transfer of robustness from the teacher policies into a unified, context-dependent policy. Symmetry augmentation ensured balanced behaviours during climb-up, with the lead leg selected dynamically based on the robot’s heading, critical for expanding the feasible workspace and improving performance. LiDAR-based elevation mapping underpins the perception system, providing detailed geometric information about the surrounding environment. A 16-beam LiDAR sensor, scanning at a frequency of 15 Hertz, was mounted on the robot’s torso to generate a point cloud representing the terrain. Raw point cloud data underwent initial processing to remove noise and outliers, followed by conversion into an elevation map with a resolution of 5 centimetres per pixel. To mitigate the effects of mapping artifacts inherent in LiDAR data, researchers incorporated these imperfections directly into the training simulations, digitally recreating common LiDAR errors within the simulated environment. Furthermore, a filtering and inpainting algorithm was applied to the elevation maps during deployment on the physical robot, smoothing out residual noise and filling in minor gaps in the data. This dual strategy of simulation augmentation and real-world refinement aimed to minimise the sim-to-real perception gap. The core of the locomotion system relies on a reinforcement learning framework, employing a generalised ‘ratchet progress reward’ to guide the learning process, prioritizing genuine progress towards the goal by tracking the best-so-far task completion and penalising any steps that do not improve upon that benchmark. This dense, velocity-free supervision encourages efficient exploration and facilitates learning of contact-rich maneuvers, crucial for stable and safe traversal. Scientists have long sought to build robots capable of navigating the complex, unpredictable spaces humans inhabit, and this latest advance in humanoid locomotion enables a robot to climb and traverse obstacles significantly larger than its own stride. The team’s “ratchet progress” reward system is noteworthy, as it prioritises consistent, incremental improvement over flashy, high-speed maneuvers, fostering a more robust and safe approach to learning. However, the reliance on LiDAR for perception introduces a potential limitation, as it can be expensive and susceptible to environmental conditions like dust or bright sunlight. Future work will likely explore integrating other sensor modalities, such as vision, to create a more resilient and adaptable system. Scaling this approach to more dynamic and cluttered environments, or to robots with different morphologies, will undoubtedly present new challenges, but ultimately, this work represents a significant step towards creating machines that can assist and collaborate with humans in increasingly complex and unpredictable settings.

👉 More information
🗞 APEX: Learning Adaptive High-Platform Traversal for Humanoid Robots
🧠 ArXiv: https://arxiv.org/abs/2602.11143

Tags:

contact-rich maneuvers. high-platform traversal Humanoid locomotion LiDAR-based policies ratchet progress reward Reinforcement Learning Sim-to-Real Transfer Unitree G1

Robots Learn to Climb, Walk and Recover on High Platforms with 80% Success

Rohail T.

Latest Posts by Rohail T.:

Efficient AI Technique Speeds up Analysis of Sound and Images

AI Learns to Self-Correct and Reduce False Claims Using Internal Knowledge

Artificial Intelligence Learns Faster in 1,000 New Virtual Worlds