Drones Learn to Fly More Safely and Perform Complex Manoeuvres Without Prior Training

Researchers are tackling the challenge of creating truly agile quadrotor flight controllers that can operate reliably in unpredictable real-world conditions. Yunfan Ren, Zhiyuan Zhu, Jiaxu Xing, and Davide Scaramuzza, all from the University of Zurich, present a novel self-adaptive framework that moves beyond reliance on extensive simulation and precise system identification. This work is significant because it allows quadrotors to actively learn and improve performance during flight, overcoming limitations imposed by conservative safety margins and the risk of operating outside known parameters. By combining Adaptive Temporal Scaling with online residual learning and a new backpropagation method, the team demonstrates a substantial increase in achievable speed, evolving a base policy from 1.9m/s to 7.3m/s within a single flight, highlighting the potential for sustained performance gains in demanding aerial environments.

Real-time adaptive control achieves tripling of quadrotor flight speed

Researchers have developed a self-adaptive control framework enabling quadrotors to triple their flight speed within 100 seconds without relying on pre-programmed simulations or extensive system identification. This breakthrough addresses a critical limitation in agile robotics, where fixed policies often struggle with real-world uncertainties and evolving hardware conditions.
The work demonstrates a quadrotor progressing from a conservative peak speed of 1.9m/s to an impressive 7.3m/s through continuous, on-policy learning in real-world flight. Complementing ATS is a hybrid dynamics model, which combines a simple nominal model with online residual learning to efficiently capture complex real-world behaviours. The researchers further introduce Real-world Anchored Short-horizon Backpropagation Through Time (RASH-BPTT), a method for optimising the control policy using the learned hybrid model and limited in-flight data.

Extensive experiments confirm the system’s ability to reliably execute agile maneuvers near actuator saturation, showcasing a significant leap in performance and robustness. This approach moves beyond merely compensating for modelling errors, establishing real-world adaptation as a mechanism for sustained performance improvement in demanding flight conditions.

This innovation promises to unlock the full potential of agile quadrotors, enabling applications requiring high-speed, precise control in dynamic and unpredictable environments. The framework’s ability to continuously learn and adapt opens doors for improved performance in tasks such as tracking, line inspection, and landmark navigation, all while operating safely and efficiently. The research implemented ATS to incentivise exploration of physical capability boundaries, contrasting with fixed-task baselines that lack this dynamic adjustment.

A hybrid dynamics model was then developed, augmenting a simple nominal model with online residual learning to enhance learning efficiency. This hybrid model forms the basis for Real-world Anchored Short-horizon Backpropagation Through Time (RASH-BPTT), a technique employed to achieve efficient and robust in-flight policy updates.

RASH-BPTT optimises the policy on-the-fly, converting limited flight data into sustained performance gains. Extensive experiments were conducted utilising a quadrotor platform to validate the framework’s capabilities in real-world conditions. The system initially operated with a conservative base policy achieving a peak speed of 1.9m/s, and subsequently evolved this to 7.3m/s within approximately 100 seconds of flight time.

This adaptation occurred through continuous, on-policy improvement directly in the real world, eliminating the need for precise system identification or offline Sim2Real transfer. The methodology prioritises exploiting platform limits, rather than solely focusing on disturbance rejection, enabling agile and high-performance behaviours. The quadrotor reliably executed agile maneuvers near actuator saturation limits throughout the testing period, demonstrating the robustness of the adaptive framework.

Real-time adaptation enhances quadrotor speed and wind-resilient inspection performance

A peak speed of 2.0m/s was evolved to 7.3m/s within approximately 100 seconds of flight time using a self-adaptive quadrotor control framework. This performance gain was achieved without relying on precise system identification or offline simulation-to-real transfer. The system demonstrably improves agility in aggressive flight regimes through real-world adaptation, rather than simply compensating for modelling errors.

Beyond increased speed, the quadrotor completed a multi-point inspection mission 42% faster under strong wind disturbances after only 2 minutes of real-world training. The research details a hybrid dynamics model augmented by a neural residual network, learning to compensate for discrepancies between simulation and reality. This model facilitates Real-world Anchored Short-horizon Backpropagation Through Time, enabling efficient and robust in-flight policy updates.

Simulation rollouts are initialised using recent real-world state estimates, mitigating compounding prediction errors and enhancing stability. Analytical gradients derived from closed-loop sensitivity maximise agility while enforcing safety constraints via a barrier function.

Real-time adaptation unlocks quadrotor agility through online learning

Researchers have developed a self-adaptive control framework enabling quadrotors to evolve from conservative operation to their agility limits in real time. This system achieves substantial performance improvements without requiring prior system identification or extensive pre-training in simulation. This speedup, a factor of 2.6, was achieved while maintaining a bounded tracking error of less than 0.3m, confirming the effectiveness of residual learning in compensating for external disturbances and allowing safe exploration of performance limits.

The findings indicate that real-world adaptation is not simply a correction for modelling errors, but a mechanism for sustained performance enhancement in challenging flight conditions. The authors acknowledge limitations related to the current restriction of temporal scaling to fixed geometries. Future research will focus on jointly optimising path shape and timing, incorporating obstacle avoidance constraints. Further work will also explore active exploration strategies to improve model robustness by intentionally probing high-uncertainty regions while ensuring stability, and integrating perception-aware control for more complex scenarios.

👉 More information
🗞 Learning Agile Quadrotor Flight in the Real World
🧠 ArXiv: https://arxiv.org/abs/2602.10111

Rohail T.

Rohail T.

As a quantum scientist exploring the frontiers of physics and technology. My work focuses on uncovering how quantum mechanics, computing, and emerging technologies are transforming our understanding of reality. I share research-driven insights that make complex ideas in quantum science clear, engaging, and relevant to the modern world.

Latest Posts by Rohail T.:

Framework Improves Code Testing with Scenario Planning

Framework Improves Code Testing with Scenario Planning

February 27, 2026
Researchers Evaluate AI Reasoning with 786 Real-World Videos

Researchers Evaluate AI Reasoning with 786 Real-World Videos

February 27, 2026
Lasers Cool Atoms to below 100 nanoKelvin in Space

Lasers Cool Atoms to below 100 nanoKelvin in Space

February 27, 2026