Training soldiers for the complexities of urban warfare demands rigorous practice and ingrained muscle memory, particularly for critical drills like Enter and Clear the Room, which require rapid threat assessment and precise teamwork. Surya Rayala, Marcos Quinones-Grueiro, and Naveeduddin Mohammed, from Vanderbilt University, along with Ashwin T S, Benjamin Goldberg and Randall Spain from US Army DEVCOM Soldier Center, and their colleagues, now present a significant advance in evaluating performance within synthetic training environments. Their research introduces a novel video-based system that automatically assesses cognitive, psychomotor, and teamwork skills during these drills, moving beyond costly sensors or subjective human observation. The team’s approach uses computer vision to track movement and gaze, then translates this data into quantifiable metrics that capture individual and team performance, ultimately providing actionable insights for improved training and more effective after-action reviews. This breakthrough promises to enhance the scalability and objectivity of military training evaluations, offering a pathway to more consistently prepare soldiers for real-world challenges.
Effective urban warfare training requires situational awareness and muscle memory, developed through repeated practice in realistic yet controlled environments. A key drill, Enter and Clear the Room (ECR), demands threat assessment, coordination, and securing confined spaces. The military utilises Synthetic Training Environments (STEs) that offer scalable, controlled settings for repeated exercises, but automatic performance assessment has traditionally been challenging, particularly when aiming for objective evaluation of cognitive, psychomotor, and teamwork skills.
Automated Team Performance Assessment via Video
Scientists have developed a video-based system to automatically assess performance in complex training exercises, specifically the military drill, Enter and Clear the Room (ECR). This work addresses the challenge of objectively evaluating cognitive, psychomotor, and teamwork skills without relying on expensive sensors or subjective human observation, delivering a scalable and accurate assessment solution. The method involves extracting skeletal data, gaze vectors, and movement trajectories directly from standard video recordings of live training drills. The team successfully analyzed footage from 19 squads executing the ECR scenario, focusing on a room with armed enemy combatants.
This analysis generated ten distinct performance metrics, including Entrance Vectors, Entrance Hesitation, Stay Along Wall, threat clearance, and floor coverage. These metrics integrate with an extended Cognitive Task Analysis (CTA) hierarchy, allowing for the generation of overall performance scores for both teamwork and cognitive skills. Analysis of videos from four squads sampled over three years confirmed the robustness of the approach, even under varying lighting conditions and camera angles. The breakthrough delivers actionable, domain-specific metrics that capture both individual and team performance during the ECR drill, providing a foundation for more effective training and after-action reviews. Measurements confirm the potential for integration with existing training systems to provide intuitive feedback to trainees.
Video Analysis Assesses Military Room Clearing Skills
This research presents a new video-based pipeline for assessing performance in complex training scenarios, specifically urban warfare drills. By employing computer vision techniques, the team extracts detailed data, including skeletal movements, gaze direction, and positioning, from standard video recordings, eliminating the need for expensive sensor systems. This innovative approach allows for objective measurement of critical skills such as psychomotor fluency, threat assessment, and team coordination, offering a more scalable and accurate method for evaluating training exercises. The team further enhanced an existing framework for understanding teamwork by integrating newly developed metrics into a Cognitive Task Analysis, demonstrating the ability to generate comprehensive performance scores. Validation through a case study of real-world drills confirms the practical value of the extracted data and the effectiveness of the pipeline in capturing both individual and team performance dynamics. The resulting insights can be readily visualized through interactive dashboards, supporting detailed after-action reviews and improved training management.
Video Pipeline Measures Urban Warfare Performance
Scientists have demonstrated the feasibility of using computer vision to automatically assess team performance in complex scenarios. The system combines multiple data sources, such as video recordings, to improve the accuracy and robustness of the assessment. Hierarchical Bayesian models provide a nuanced understanding of team dynamics and individual contributions, delivering actionable insights for training improvement. The authors acknowledge limitations in tracking accuracy and the need for robust ground-truth validation, recognizing that the current work focuses on a specific type of scenario. Future research will concentrate on expanding the analysis to incorporate three-dimensional video data and extending the system’s applicability to larger-scale tactical exercises, paving the way for more comprehensive and scalable training evaluation within synthetic environments.
👉 More information
🗞 Video-Based Performance Evaluation for ECR Drills in Synthetic Training Environments
🧠 ArXiv: https://arxiv.org/abs/2512.23819
