Researchers are tackling the challenge of translating natural language instructions into actions for robots, a crucial step towards more intuitive human-robot interaction. Riccardo Andrea Izzo, Gianluca Bardaro, and Matteo Matteucci, all from Politecnico di Milano, present BTGenBot-2, a new open-source small language model designed to generate executable behaviour trees from simple task descriptions and robot action lists. This work is significant because it offers a lightweight and efficient solution, unlike computationally intensive alternatives, enabling deployment on resource-constrained robots and introducing the first standardised benchmark for this type of task generation. Evaluations demonstrate BTGenBot-2 outperforms larger models like GPT-5 and Claude Opus 4.1, achieving high success rates with significantly faster inference speeds than its predecessor.

This innovation addresses critical limitations hindering the widespread adoption of large language models in robotics, namely the prevalence of closed-source systems and computationally intensive processes unsuitable for real-world deployment.

Unlike previous approaches, BTGenBot-2 facilitates zero-shot behavior tree generation, incorporates error recovery mechanisms for both inference and runtime, and maintains a lightweight profile ideal for robots with limited resources. Researchers at the Politecnico di Milano developed this model alongside a novel standardized benchmark comprising 52 navigation and manipulation tasks implemented within NVIDIA Isaac Sim.

Extensive evaluations demonstrate that BTGenBot-2 consistently surpasses the performance of established models including GPT-5, Claude Opus 4.1, and larger open-source alternatives across both functional and non-functional metrics. Specifically, the model achieves average success rates of 90.38% in zero-shot scenarios and an impressive 98.07% in one-shot settings, all while delivering up to 16x faster inference speeds compared to its predecessor, BTGenBot.

The core of this achievement lies in a compute-efficient fine-tuning process applied to a pretrained small language model, utilising a custom synthetic dataset of 5,204 instruction-following examples. BTGenBot-2’s ability to generate directly executable behavior trees, compatible with the Robot Operating System 2 (ROS2), represents a significant step towards more accessible and robust robotic task planning.

This work not only introduces a powerful new tool for roboticists but also establishes a foundation for reproducible research through the public release of the dataset, model weights, codebase, and benchmark. Furthermore, the inclusion of novel failure detection mechanisms enhances the robustness of the generated behavior trees, allowing for error recovery during both the planning and execution phases. By addressing the need for open-source, efficient, and standardised solutions, BTGenBot-2 paves the way for deploying advanced robotic capabilities on a wider range of platforms and in more dynamic environments.

Developing and evaluating BTGenBot-2 for natural language to behavior tree translation is a crucial step forward

A 1B-parameter small language model, BTGenBot-2, was developed to directly translate natural language task descriptions and robot action primitives into executable behavior trees in XML format. The research addressed limitations in existing robot learning methods, specifically the prevalence of closed-source or computationally intensive systems hindering real-world deployment and the lack of a standardised robotic task generation representation.

BTGenBot-2 was built upon a pretrained small language model and further fine-tuned using compute-efficient methods on a custom synthetic instruction-following dataset. This study created a dataset of 5,204 pairs consisting of executable behavior trees and corresponding natural language task descriptions, providing a foundation for generalisable behavior tree generation.

Unlike previous approaches, BTGenBot-2 facilitates zero-shot behavior tree generation and incorporates error recovery mechanisms for both inference and runtime, all while maintaining a lightweight design suitable for robots with limited computational resources. The model’s performance was evaluated using a newly introduced benchmark comprising 52 navigation and manipulation tasks implemented within NVIDIA Isaac Sim, categorised into three levels of difficulty.

Extensive evaluations demonstrated that BTGenBot-2 consistently surpassed the performance of GPT-5, Claude Opus 4.1, and larger open-source models across functional and non-functional metrics. Specifically, the model achieved average success rates of 90.38% in zero-shot and 98.07% in one-shot scenarios, alongside inference speeds up to 16times faster than its predecessor, BTGenBot. Two novel failure detection mechanisms were designed to improve robustness by handling errors during both inference and runtime, enhancing the reliability of the generated behavior trees.

High-performance behaviour tree generation from language using a small language model is a challenging but promising area

Researchers present BTGenBot-2, a 1B-parameter open-source small language model achieving an average success rate of 90.38% in zero-shot behavior tree (BT) generation and 98.07% in one-shot scenarios. The model directly converts natural language task descriptions and robot action primitives into executable behavior trees formatted in XML.

This work introduces the first standardized benchmark comprising 52 navigation and manipulation tasks within Isaac Sim, used to rigorously evaluate the model’s performance. BTGenBot-2 demonstrates up to 16x faster inference speeds compared to the previous BTGenBot model, enabling deployment on resource-constrained robots.

The study details a new synthetic instruction-following dataset of 5,204 pairs of executable BTs and natural language task descriptions, designed to facilitate generalisable BT generation. This dataset addresses a gap in existing planning datasets and provides a foundation for further research. The research highlights two novel failure detection mechanisms implemented within BTGenBot-2, significantly enhancing robustness during both inference and runtime.

Extensive evaluations show that BTGenBot-2 consistently outperforms GPT-5, Claude Opus 4.1, and larger open-source models across both functional and non-functional metrics. The team publicly releases the model weights, codebase, benchmark, and dataset to promote dissemination and reproducibility within the robotics community.

BTGenBot-2 is validated extensively in simulation and on real robots, establishing it as a strong BT generator compatible with the Robot Operating System 2 (ROS2) ecosystem. The BehaviorTree.CPP library, a standard component within ROS2, is utilized due to its inclusion in the Navigation2 stack. This work argues for the development of open-source and efficient models built on open-source LLMs, enabling local deployment without reliance on external APIs.

Superior performance of a compact language model for robotic behaviour tree generation is demonstrated

Researchers have developed BTGenBot-2, a 1B-parameter open-source small language model capable of converting natural language task descriptions into executable behaviour trees in XML. This model addresses challenges in robot learning by offering a lightweight, computationally efficient solution for task planning, unlike many existing closed-source or intensive methods.

BTGenBot-2 enables zero-shot behaviour tree generation and incorporates error recovery mechanisms for both inference and runtime, making it suitable for robots with limited computational resources. The system was evaluated using a newly introduced standardised benchmark comprising 52 navigation and manipulation tasks within NVIDIA Isaac Sim, and also tested on a 6-DoF robotic arm.

Extensive evaluations demonstrated that BTGenBot-2 consistently outperforms larger models like GPT-5 and Claude Opus 4.1 in both functional and non-functional metrics, achieving average success rates of 90.38% in zero-shot and 98.07% in one-shot scenarios, with up to 16x faster inference speeds compared to its predecessor. These findings highlight the potential of small language models for bridging the gap between natural language instructions and practical robotic execution.

The authors acknowledge that the benchmark includes complex tasks exceeding the constraints of their laboratory setup, suggesting that the results presented in their tables offer a more comprehensive evaluation of the model’s capabilities. Future research could focus on expanding the model’s capabilities to handle even more complex tasks and diverse robotic platforms, potentially exploring integration with other robotic systems and sensors to enhance its adaptability and robustness in real-world environments.

👉 More information
🗞 BTGenBot-2: Efficient Behavior Tree Generation with Small Language Models
🧠 ArXiv: https://arxiv.org/abs/2602.01870

Tags:

action primitives behaviour trees BTGenBot-2 inference speed. Isaac Sim Large Language Models robot learning Task Planning XML zero-shot learning

Robot Brains Get a Boost with Open-Source AI Planning Tool

Developing and evaluating BTGenBot-2 for natural language to behavior tree translation is a crucial step forward

High-performance behaviour tree generation from language using a small language model is a challenging but promising area

Superior performance of a compact language model for robotic behaviour tree generation is demonstrated

Quantum Strategist

Latest Posts by Quantum Strategist:

Single-Photon Detector Flaws Unravelled, Paving the Way for Faster Data Transmission

Autonomous Quantum Error Correction Achieves Passive Stability in Two Dimensions

Photon-Graviton Scattering Achieves Novel Gravitational Wave Detection Via Quantum Interference