Researchers are increasingly exploring reinforcement learning as a means to control robots in the challenging environment of space, and a team led by Kenneth Stewart, Samantha Chapin, and Roxana Leontie from the U. S. Naval Research Laboratory now presents the first successful demonstration of this technology on orbit. They trained a deep neural network to autonomously control NASA’s Astrobee robot aboard the International Space Station, replacing its standard control systems and enabling complex navigation in microgravity. This achievement validates a new training pipeline that effectively bridges the gap between simulated environments and the realities of spaceflight, utilising advanced simulation to accelerate learning. The successful deployment of this terrestrial training directly to a space-based application marks a significant step towards enabling rapid adaptation and responsiveness for future missions focused on In-Space Servicing, Assembly, and Manufacturing.
The overarching goal is to develop robust, autonomous control systems for in-space operations, including inspection, assembly, and servicing. A key challenge addressed is bridging the gap between simulated training environments and the complexities of real-world spaceflight. The research demonstrates successful implementation of RL algorithms, particularly Proximal Policy Optimization, to control the Astrobee robot in both realistic simulations and on the physical robot in a laboratory setting.
A crucial technique employed is curriculum learning, where the robot begins with simpler tasks and gradually progresses to more complex ones, improving learning efficiency. Domain randomization, where simulation parameters are varied, played a key role in improving robustness. The work focuses on achieving precise six-degree-of-freedom control, enabling complex maneuvers and manipulation. Scientists also explored adaptive control techniques to handle uncertainties and disturbances in the spacecraft’s dynamics, and developed algorithms for autonomous collision avoidance, essential for safe operation in cluttered environments.
The research also touched on the potential for using RL to coordinate multiple spacecraft in formation and for visual servoing to guide precise manipulation and assembly tasks. The primary methodology employed is Reinforcement Learning, utilizing deep neural networks to approximate the policy and value functions. Generalized Advantage Estimation was used to reduce variance in policy gradient methods, and visual servoing leveraged camera feedback to guide the robot’s movements. Safe Reinforcement Learning techniques were also implemented to ensure the robot operates within safe boundaries. Ongoing challenges include continually improving techniques for transferring algorithms from simulation to the real world, ensuring robustness to disturbances and sensor noise, and prioritizing safety to prevent collisions. Future work will focus on scaling the algorithms to handle more complex tasks and integrating them with other spacecraft systems. They engineered a deep neural network to replace the standard control systems of the Astrobee, enabling autonomous navigation in microgravity. The core of this achievement involved training the RL policy within NVIDIA’s Omniverse Isaac Lab physics simulator, a high-fidelity environment capable of running thousands of parallel, randomized simulations to maximize robotic experience. To address the challenge of transferring policies from simulation to the real world, the team implemented a curriculum learning approach.
This method gradually introduced variations within the simulated environments, effectively training the RL policy to cope with unexpected changes and discrepancies between simulation and the actual ISS environment. Following simulation training, the policy underwent validation within the NASA Ames’ Astrobee simulator using Gazebo and ROS Noetic, demonstrating its ability to seamlessly replace the robot’s existing control system. Preliminary terrestrial testing then took place at the NASA Ames’ Granite Lab, a facility that mimics zero-gravity conditions using air-bearings. Here, the RL policy’s commands controlled the Astrobee’s fan-based propulsion system, allowing for direct performance comparison against the baseline controller. Finally, the trained policy was successfully deployed and tested in the actual microgravity environment of the ISS, marking the first demonstration of RL-based control for a free-flying space robot. This work validates a novel training pipeline designed to bridge the gap between simulation and reality, enabling the transfer of learned control policies to the complexities of space. The team trained a deep neural network using NVIDIA’s Omniverse physics simulator and curriculum learning to replace the Astrobee’s standard control systems, allowing it to navigate in microgravity. The research involved a multi-stage validation process, beginning with training within the high-fidelity Omniverse Isaac Lab simulator, which allowed for the creation of thousands of parallel, randomized environments to maximize robotic experience.
The trained policy was subsequently validated within the NASA Ames’ Astrobee simulator using Gazebo and ROS Noetic, demonstrating its ability to control the robot’s motion within existing software. Preliminary terrestrial testing at NASA Ames’ Granite Lab, a facility mimicking zero-gravity conditions, confirmed the policy’s performance compared to the baseline Astrobee controller. Finally, the RL policy successfully controlled the Astrobee within the microgravity environment of the ISS, representing a first-of-its-kind demonstration of RL-based control for free-flying space robots. Scientists developed a training pipeline, utilizing a high-fidelity physics simulator and curriculum learning, to enable the Astrobee to navigate in microgravity without relying on its standard control systems. This achievement represents a significant step towards enabling complex in-space operations, such as servicing, assembly, and manufacturing, with greater autonomy and responsiveness to changing mission needs. The team demonstrated robustness to variations in mass during ground testing, suggesting the potential for wider applicability of this approach. While the study focused on validating the training pipeline, the authors suggest that more complex policies could be generated for a range of in-space tasks using this methodology.
👉 More information
🗞 Crossing the Sim2Real Gap Between Simulation and Ground Testing to Space Deployment of Autonomous Free-flyer Control
🧠 ArXiv: https://arxiv.org/abs/2512.03736
