Accurate localisation remains a fundamental challenge in planetary robotics, crucial for enabling the increasingly ambitious autonomous missions of the future. Lachlan Holden, Feras Dayoub, and Alberto Candela, from the AI for Space Group, The University of Adelaide, and the Jet Propulsion Laboratory respectively, present research addressing this issue by exploring a novel approach to rover localisation using data from both ground and aerial perspectives. Their work focuses on enabling rovers to pinpoint their location within an aerial map using limited-view ground images, a task complicated by the scarcity of labelled real-world data for machine learning training. This research is significant because it leverages vision foundation models and synthetic data to overcome this data limitation, alongside a newly created dataset of real and synthetic rover trajectories captured in a planetary analogue environment. The team, including David Harvey and Tat-Jun Chin, demonstrate that this cross-view localisation method, combined with particle filters, provides accurate position estimation even across complex terrains.
Rover Localisation via Aerial Mapping and Machine Learning
Future missions require enhanced capabilities to support increased scale and scope, building upon the achievements of the Ingenuity helicopter and numerous planetary orbiters. This paper investigates the use of machine learning for rover localisation within a local aerial map, utilising limited field-of-view monocular ground-view RGB images as input. A significant challenge is the scarcity of real space data with ground-truth position labels necessary for effective training, and this work proposes a novel method employing cross-view-localising dual-encoder deep neural networks. The approach leverages semantic segmentation to identify potential rover locations within the aerial map, reducing the search space and improving computational efficiency.
By learning cross-view features, the dual-encoder network correlates ground-based observations with aerial map data, even under challenging conditions. This allows for robust and accurate rover localisation without relying on precise pre-existing maps or extensive ground-truth data, representing a deep learning architecture specifically designed for cross-view localisation. The dual-encoder network extracts meaningful features from both ground and aerial imagery, facilitating accurate rover pose estimation. The method demonstrates improved performance compared to existing localisation techniques, particularly in scenarios with limited sensor data and challenging environmental conditions, offering a practical solution for enabling autonomous navigation and exploration in future planetary missions.
Rover Localisation via Cross-View Image Matching
This research details a system for absolute rover localisation using cross-view image matching techniques. The core innovation is a dual-encoder network designed to localise a rover within a local aerial image, addressing the challenge of monocular RGB images with limited horizontal field-of-view. The method explicitly addresses the domain gap between synthetic and real-world data through synthetic data generation and semantic segmentation techniques utilising vision foundation models like LLMDet and SAM 2. The system successfully localises on real data without requiring real labelled data during training, making it practical for space missions where annotated datasets are scarce.
A new dataset of labelled synthetic image pairs and corresponding georeferenced aerial images with ground-truth-localised rover trajectories was also created. The proposed network, based on transformers (TransGeo), utilises LLMDet and SAM 2 to enhance object detection and segmentation accuracy. Experiments were conducted using an RTX Ti GPU, with each ground-view image processed in approximately fifteen seconds. Future work will focus on reducing network complexity to improve inference times on space-grade hardware and integrating this absolute localisation method with traditional odometry for enhanced navigation. This research demonstrates the applicability of deep learning techniques for absolute rover localisation, overcoming challenges related to data scarcity and domain adaptation.
Rover Localisation via Aerial and Ground Vision
Scientists have developed a method for rovers to accurately determine their position using limited-field-of-view monocular ground-view RGB images and local aerial maps. The research addresses the challenge of scarce ground-truth data for training machine learning algorithms, proposing a novel cross-view-localising dual-encoder neural network. This work leverages semantic segmentation with vision foundation models and extensive synthetic data to overcome the domain gap between simulated and real-world images, paving the way for more autonomous planetary exploration. The team constructed a new cross-view dataset comprising real-world rover trajectories and a high-volume synthetic dataset, both with corresponding ground-truth localisation data captured in a planetary analogue facility.
Validation results reveal that a network trained on synthetic data achieves a 100% top-20% matching rate in synthetic data, but struggles with real data, achieving only 46.9%. However, fine-tuning on real data significantly improves performance, reaching a 99.4% top-20% matching rate. Furthermore, a network trained on masked synthetic images achieves a 99.7% top-20% matching rate in synthetic masked validation and 80.2% on ground-truth masked images, confirming successful domain adaptation. Using particle filters for state estimation, the system was evaluated on six real-world rover trajectories, demonstrating the potential for advanced autonomy in future ground-aerial robotic teams for planetary missions.
Rover Localisation via Cross-View Image Matching
This research demonstrates a successful application of a dual-encoder, cross-view localisation network enabling a planetary rover to determine its position within a local aerial map. The work addresses the challenge of utilising limited field-of-view monocular RGB images, a common constraint for ground-based robotic platforms, and bridges the gap between synthetic and real-world imagery through semantic segmentation and vision foundation models. Accurate position estimation was achieved using particle filters. A significant contribution of this study is the creation of new datasets, comprising both labelled synthetic image pairs and real-world rover trajectories with corresponding aerial imagery, captured within a planetary analogue environment. Further optimisation is needed to reduce network complexity and improve inference times for deployment on space-qualified hardware. Future research should focus on integrating this absolute localisation method with traditional odometry-based relative localisation techniques to enhance overall rover autonomy and navigational capabilities in planetary exploration.
👉 More information
🗞 Vision Foundation Models for Domain Generalisable Cross-View Localisation in Planetary Ground-Aerial Robotic Teams
🧠 ArXiv: https://arxiv.org/abs/2601.09107
