Sa-fari Dataset Enables Multi-Animal Tracking across 99 Species in 11,609 Videos over 10 Years

Automated analysis of wildlife footage represents a crucial tool for conservation efforts, yet progress in multi-animal tracking currently suffers from a lack of comprehensive datasets. Dante Francisco Wasmuht, Otto Brookes, and Maximillian Schall, along with colleagues including Pablo Palencia, Chris Beirne, and Tilo Burghardt, address this challenge by introducing SA-FARI, a new large-scale resource for training and benchmarking multi-animal tracking systems. This dataset comprises over 11,000 camera trap videos, collected across four continents and spanning 99 species, and features extensive annotations including bounding boxes, segmentation masks, and species labels for over 16,000 individual animals. SA-FARI establishes a new foundation for developing generalizable tracking algorithms, offering researchers the means to move beyond species-specific solutions and advance the field of wildlife monitoring through computer vision.

Segmenting Animals in Video Footage

Researchers have introduced SA-FARI, a new dataset designed to significantly advance research in animal recognition, identification, segmentation, and tracking within video footage. Leveraging the Segment Anything model (SAM), this resource addresses critical challenges in wildlife monitoring and conservation by providing a comprehensive platform for developing and evaluating advanced computer vision algorithms. This work focuses on automatically identifying and outlining individual animals in videos, tracking their movements, recognizing species, and even tracking animals described by general phrases, rather than predefined categories. This research builds upon advancements in computer vision applied to wildlife conservation, encompassing segmentation models, tracking algorithms, and pose estimation.

SA-FARI expands upon these efforts by providing a large-scale, diverse dataset that enables researchers to develop more robust and generalizable algorithms. The team evaluated algorithm performance using metrics such as Identity F1-score and Higher Order Tracking Accuracy, ensuring a comprehensive assessment of tracking capabilities. SA-FARI distinguishes itself through its focus on realistic video footage, its support for open-vocabulary tracking, and its large scale and geographic diversity, making it a valuable resource for researchers seeking to understand animal populations in the wild.

Large-Scale Wild Animal Tracking Dataset Created

Scientists have pioneered SA-FARI, a comprehensive dataset designed to advance multi-animal tracking in challenging wild settings. The team meticulously collected 11,609 camera trap videos spanning approximately 10 years, from 2014 to 2024, across 741 locations on four continents, representing 99 distinct species categories. Exhaustive annotation of this footage culminated in roughly 46 hours of densely annotated video containing 16,224 unique animal identities. The annotation process involved creating 942,702 individual bounding boxes, detailed segmentation masks, and accurate species labels for each animal observed.

Researchers also recorded anonymized camera trap locations, enabling analysis of spatial distribution and habitat use. Comparative analyses were conducted against state-of-the-art vision-language models, including SAM 3, using both species-specific and generic animal prompts. This work addresses a critical gap in existing datasets, which typically lack the scale, species diversity, and geographic coverage necessary for developing generalizable tracking systems.

SA-FARI Dataset Enables Multi-Animal Tracking

The research team introduced SA-FARI, a groundbreaking dataset designed to advance multi-animal tracking (MAT) in wildlife conservation, addressing a critical need for comprehensive training data. This new resource comprises 11,609 camera trap videos, collected over approximately 10 years (2014-2024) from 741 locations across four continents, representing 99 distinct species categories. Exhaustive annotation of these videos resulted in approximately 46 hours of densely annotated footage, containing 16,224 masklet identities and a total of 942,702 individual bounding boxes, segmentation masks, and species labels. The team also made anonymized camera trap locations publicly available alongside the video data.

Experiments demonstrate that SA-FARI surpasses existing datasets in both scale and diversity, offering a significantly larger number of species and independent locations than any previous resource. The dataset uniquely provides a substantial collection of high-quality, manually verified segmentation masks, enabling precise tracking of individual animals. Results show that training models on SA-FARI delivers over 20-point gains in HOTA metrics, a key measure of tracking performance, underscoring the value of the dataset’s large scale and diverse annotations. The dataset includes data from a wider range of ecological contexts and collection strategies, offering a more comprehensive foundation for developing generalizable MAT models.

SA-FARI Dataset Enables Multi-Animal Tracking

The team presents SA-FARI, a new large-scale dataset designed to advance multi-animal tracking in wild environments. By consolidating over 11,000 camera trap videos from more than 700 locations, SA-FARI offers unprecedented scale and diversity, encompassing 99 species categories and a decade of observations. Crucially, the dataset includes exhaustively annotated footage, providing over 16,000 reliable, high-quality identifications of individual animals through time. This work addresses a significant gap in the field, as previous datasets have been limited by narrow geographic focus, few species represented, and less reliable annotation methods.

Results demonstrate that even state-of-the-art models perform poorly without access to large, richly annotated collections like SA-FARI, highlighting the challenges inherent in real-world wildlife data. Fine-tuning a leading model with the SA-FARI training data resulted in a threefold improvement in performance, underscoring the dataset’s value. Future work should prioritize integrating additional ecoregions to broaden species representation and mitigate geographic bias, and expanding the data to include modalities such as animal body pose, depth information, and natural language descriptions. By providing a rigorous benchmark, this research aims to accelerate the development of robust tracking systems for effective biodiversity monitoring and conservation.

👉 More information
🗞 The SA-FARI Dataset: Segment Anything in Footage of Animals for Recognition and Identification
🧠 ArXiv: https://arxiv.org/abs/2511.15622

Rohail T.

Rohail T.

As a quantum scientist exploring the frontiers of physics and technology. My work focuses on uncovering how quantum mechanics, computing, and emerging technologies are transforming our understanding of reality. I share research-driven insights that make complex ideas in quantum science clear, engaging, and relevant to the modern world.

Latest Posts by Rohail T.:

Energy Scaling Laws Predict Diffusion Model GPU Consumption, Revealing 0.9 Dominance of Denoising Operations

Energy Scaling Laws Predict Diffusion Model GPU Consumption, Revealing 0.9 Dominance of Denoising Operations

November 25, 2025
Fluid Antenna System Enables UAV-to-Ground Communication under Double-Shadowing Fading Channels

Fluid Antenna System Enables UAV-to-Ground Communication under Double-Shadowing Fading Channels

November 25, 2025
Horn-schunck Optical Flow Computation with Bilinear Interpolation Improves Motion Estimation Accuracy

Horn-schunck Optical Flow Computation with Bilinear Interpolation Improves Motion Estimation Accuracy

November 25, 2025