The increasing potential of human-drone interaction is being unlocked by rapid progress in large language models. Yizhan Feng, Hichem Snoussi, and Jing Teng, from the University of Technology of Troyes and North China Electric Power University respectively, alongside Abel Cherouat and Tian Wang et al., present a novel method for controlling drone operations through natural language. Their research integrates a refined CodeT5 model with the AirSim drone simulator, allowing users to issue commands and access drone status with greater ease. This work is significant because it lowers the barrier to entry for drone operation, enabling efficient execution of multi-task scenarios within complex, simulated environments. By translating natural language into executable code, the team demonstrates improved task efficiency and understanding, paving the way for wider application of drone technologies.
Their research integrates a refined CodeT5 model with the AirSim drone simulator, allowing users to issue commands and access drone status with greater ease. This work is significant because it lowers the barrier to entry for drone operation, enabling efficient execution of multi-task scenarios within complex, simulated environments.
Drone Control via Language and Simulation
The system leverages a large dataset of command-execution pairs generated by ChatGPT, combined with developer-written drone code, to fine-tune CodeT5 for automated translation of natural language into executable code for drone tasks. Experimental results demonstrate superior task execution efficiency and command understanding capabilities in simulated environments. Future work will extend the model’s functionality in a modular manner, enhancing adaptability to complex scenarios and driving the application of drone technologies in real-world environments.
Drones are increasingly integrated into daily life, with applications in environmental monitoring, communication, search and rescue, package delivery, and wireless network provisioning. While advancements have been made in integrating artificial intelligence with unmanned aerial vehicles, developing a general-purpose drone system capable of multi-task operations remains a significant challenge. This requires understanding real-world physics, environmental dynamics, and the physical actions needed for task execution. Recent advancements in natural language processing and large language models offer a potential solution. Large language models, such as OpenAI’s ChatGPT, have demonstrated proficiency in understanding, generating, and translating human-like data due to extensive training on large-scale datasets. These models possess robust general knowledge and reasoning skills, and extending their capabilities to drones holds revolutionary potential.
For applications requiring a combination of natural language and domain-specific terminology, fine-tuning LLMs on specialized datasets proves highly effective, particularly in tasks demanding advanced comprehension. Inspired by the ability of LLMs to automate code generation, the research aims to leverage pre-training techniques on drone code corpora to enable automated code generation for UAV tasks. This aims to democratise drone operations, empowering non-specialists to execute complex tasks without technical expertise. To address limitations associated with closed-source systems, the research focuses on using open-source, adjustable language models, specifically CodeT5, and fine-tuning it with a training corpus of pairs generated by ChatGPT and UAV developer code. This enhances the model’s controllability and transferability for incremental data training and application to various drone brands and models.
AirSim, based on the Unreal Engine, offers advanced features including realistic physical and visual simulations, dynamic integration of Unity game plugins, and seamless integration into Unreal environments, alongside a wealth of developer-contributed modules. The developed system enables users to control drones using natural language commands or prompts. The fine-tuning training dataset comprises a general set of simple commands, maintained as a JSON file mapping natural language to Python code snippets, and a complex task operation component utilising developer-provided code. Leveraging ChatGPT’s generative capabilities, a large number of pairs were generated for basic drone operations. The overall workflow effectively presents the methodology, mapping natural language inputs to programming language outputs. CodeT5’s performance in code translation and syntactic correctness makes it suitable for drone-specific data, generating executable code directly from natural language commands for deployment in the AirSim simulation environment.
The system integrates a Python-based conversational agent with a C++ interface for the AirSim simulator, establishing an asynchronous connection and transmitting validated commands to the simulated drone. To assess command generation accuracy, a series of test scenarios were designed, encompassing various drone actions and real-world application requirements. The evaluation criteria included syntactic correctness, effectiveness, and consistency with predefined command formats, validated by comparing the execution results of manually written standard commands with those of the generated commands. The trained model translates natural language instructions into executable code lines, which are transmitted to the drone for execution, returning relevant outputs such as images, status information, and video data. The presented system uses fine-tuned CodeT5 to enable natural language-driven control of drones within a simulation environment.
By creating a specialized dataset, CodeT5 was fine-tuned to handle both simple and complex drone tasks, ensuring adaptability across different UAV platforms. The system prioritises efficiency, reducing latency for real-time task execution and minimising drone idle time. Through integration with the AirSim simulation environment, the system benefits from advanced physical simulations for realistic drone control testing. The approach demonstrates progress in AI-driven drone control systems and sets the stage for further research into more complex, real-world applications of natural language interfaces for UAV operations. Future work will focus on expanding task categories and applying the system to real-world UAVs. This work has been partially funded by the BPI DreamScanner project.
Drone Control via Language and Simulation
The research team successfully demonstrated automated translation from natural language into executable drone code, paving the way for more intuitive and accessible drone control systems. Experiments within the AirSim simulator reveal the system’s capacity to construct visually realistic and dynamic environments, accurately mimicking real-world scenarios for drone applications. By combining a substantial dataset of natural language and program code pairs, generated by ChatGPT, with existing developer-written drone code, the team fine-tuned the CodeT5 model to enhance its performance. This training process resulted in a system capable of understanding and executing a diverse range of commands with increased efficiency. The work demonstrates a substantial advancement in the ability to translate user intent into actionable drone behaviour.
The researchers leveraged the advanced features of AirSim, including its realistic physics, visual simulations, and compatibility with a variety of drone types, to create challenging and representative test scenarios. This approach allows for the flexible construction of complex environments, exceeding the capabilities of more rudimentary simulators and enabling more robust testing of the system’s performance. The team’s work expands upon existing applications, moving beyond single-task inspection scenarios to encompass a broader range of functional commands and applications. Furthermore, the use of an open-source, adjustable language model like CodeT5 addresses limitations found in closed-source systems such as ChatGPT.
This allows for domain-specific fine-tuning with UAV developer code, enhancing both controllability and transferability to different drone models. The breakthrough delivers a system that can be incrementally improved with additional data and adapted to various drone platforms, promising a versatile and scalable solution for future drone applications. The team intends to extend the model’s functionality in a modular fashion, further enhancing its adaptability and driving the application of drone technologies.
👉 More information
🗞 Large Language Models to Enhance Multi-task Drone Operations in Simulated Environments
🧠 ArXiv: https://arxiv.org/abs/2601.08405
