Shows Irl-Dal Planning with 0.05 Improvement for Safer Autonomous Driving

Researchers are tackling the complex challenge of safe and reliable autonomous driving with a new trajectory planning framework. Seyed Ahmad Hosseini Miangoleh, Amin Jalal Aghdasian, and Farzaneh Abdollahi, all from the Department of Electrical Engineering at Amirkabir University of Technology, present IRL-DAL, an inverse reinforcement learning approach that combines expert imitation with adaptive planning and a novel safety supervisor. This work is significant because it achieves a 96% success rate in simulated environments, alongside a substantial reduction in collisions, establishing a new benchmark for autonomous navigation and promising more robust performance in challenging and unpredictable real-world scenarios.

Diffusion-based inverse reinforcement learning for safer autonomous driving offers promising results

Scientists have developed a novel inverse reinforcement learning framework, termed IRL-DAL, to significantly enhance autonomous vehicle navigation. The research introduces a diffusion-based adaptive lookahead planner designed to achieve safer and more robust driving capabilities. Training commenced with imitation learning from an expert finite state machine controller, establishing a stable foundation for subsequent learning phases.
Environment terms were then integrated with an inverse reinforcement learning discriminator signal, aligning the vehicle’s actions with desired expert-level goals. Reinforcement learning was subsequently implemented using a hybrid reward system, combining diffuse environmental feedback with targeted rewards derived from the inverse reinforcement learning process.

A conditional diffusion model functions as a safety supervisor, meticulously planning safe paths that maintain lane position, avoid obstacles, and ensure smooth vehicle movement. Crucially, a learnable adaptive mask improves the perception system by dynamically shifting visual attention based on vehicle speed and the presence of nearby hazards.

Following the initial imitation phase, the driving policy underwent fine-tuning using Proximal Policy Optimization, a sophisticated algorithm for policy improvement. Extensive training was conducted within the Webots simulator, employing a two-stage curriculum to progressively challenge the autonomous agent.

The team achieved a remarkable 96% success rate in navigation tasks, while simultaneously reducing collisions to just 0.05 per 1,000 steps, establishing a new benchmark for safe autonomous driving performance. By implementing this innovative approach, the agent demonstrates not only proficient lane keeping but also the ability to expertly handle unsafe conditions, substantially increasing overall robustness.

The researchers have made the code publicly available, facilitating further research and development in this critical field. This work opens new avenues for creating autonomous systems capable of navigating complex and dynamic environments with a level of safety and reliability comparable to human drivers.

Framework development and autonomous vehicle policy refinement are key priorities this quarter

Scientists developed a novel inverse reinforcement learning framework, IRL-DAL, to enhance autonomous vehicle safety and robustness. The study began by training the agent using imitation from a finite state machine (FSM) controller, establishing a stable foundation for subsequent learning. Environment terms were then integrated with an IRL discriminator signal, aligning the agent’s behaviour with expert driving goals.

Reinforcement learning was subsequently performed, employing a hybrid reward system that combined environmental feedback with targeted IRL rewards to refine the policy. A conditional planner functioned as a safety supervisor, ensuring safe path planning by maintaining lane position, avoiding obstacles, and promoting smooth vehicle movements.

Researchers engineered a learnable adaptive mask (LAM) to improve perception, dynamically shifting visual attention based on vehicle speed and the presence of nearby hazards. Following FSM-based imitation, the policy underwent fine-tuning with Proximal Policy Optimization (PPO), leveraging a two-stage curriculum within the Webots simulator.

Experiments achieved a 96% success rate, demonstrating significant progress in safe navigation, while simultaneously reducing collisions to 0.05 per 1,000 steps. This performance establishes a new benchmark for autonomous vehicle safety. The approach enables the agent to not only maintain lane discipline but also effectively handle unsafe conditions at an expert level, increasing overall robustness.

The team harnessed the power of this methodology to address challenges in highly dynamic environments, specifically focusing on reliable obstacle avoidance, a critical factor for real-world deployment. To facilitate further research, the study made its code publicly available.

Improved autonomous navigation via inverse reinforcement learning and adaptive lookahead planning enables robust and efficient pathfinding

Scientists achieved a 96% success rate in autonomous vehicle navigation using a novel inverse reinforcement learning framework incorporating an adaptive lookahead planner (IRL-DAL). The research, conducted within the Webots simulator, demonstrably reduced collisions to 0.05 per 1,000 steps, establishing a new benchmark for safe autonomous driving.

Experiments began with imitation learning from a finite state machine (FSM) controller, providing a stable foundation for subsequent reinforcement learning. The team measured performance through a two-stage curriculum, combining environmental feedback with targeted inverse reinforcement learning rewards.

Data shows a significant improvement in mean reward, increasing from 85.2 with baseline PPO and uniform sampling to 180.7 with the complete IRL-DAL framework. This 16% increase was accompanied by a reduction in collisions, falling from 0.63 to 0.05 per 1,000 steps. The system utilises a 4-channel visual tensor of size 64×64, combined with a 180-beam LiDAR scan to create a comprehensive environmental understanding.

Further analysis revealed that incorporating the FSM replay buffer boosted the mean reward by 41%, while the diffusion planner contributed a 29% increase. The learnable adaptive mask (LAM) and safety-aware energy controller (SAEC) further refined performance, resulting in a 2.45m average displacement error (ADE) and a 5.1m final displacement error (FDE).

Tests prove the system effectively handles unsafe conditions at an expert level, enhancing robustness and achieving smoother lane changes and collision avoidance. Algorithm 1 details the integrated training pipeline, encompassing FSM-aware replay, behaviour cloning, generative adversarial imitation learning, Proximal Policy Optimization, the diffusion-based adaptive lookahead planner, and the safety-aware energy controller.

The model’s performance was evaluated across various architectural variants, with quantitative results presented in Table I, demonstrating the contribution of each component to overall success. The training procedure, lasting a total of 50,000 steps, leveraged a partitioned replay buffer to prioritise rare safety-critical events.

IRL-DAL achieves robust autonomous driving via imitation, reinforcement and diffusion planning, demonstrating state-of-the-art performance

Scientists have developed a new inverse reinforcement learning framework, termed IRL-DAL, designed to enhance the safety and adaptability of autonomous vehicles. The framework integrates imitation learning from a finite state machine controller with reinforcement learning and a diffusion-based safety planner.

This combination allows the agent to learn a driving policy resembling that of an expert, while maintaining stability and responding safely to challenging scenarios. The research demonstrates a 96% success rate in simulated driving tasks, alongside a significant reduction in collisions to 0.05 per 1,000 steps, establishing a new benchmark in safe autonomous navigation.

Ablation studies confirmed the contribution of each component, including FSM-aware replay for rare event coverage, the diffusion planner for failure reduction, and a learnable adaptive mask (LAM) with safety-aware environment constraints (SAEC) for improved safety and performance. The authors acknowledge that the current evaluation is limited to the Webots simulator and future work could explore real-world testing and more complex environments. Further research directions include investigating the generalizability of the LAM to different sensor modalities and expanding the framework to handle more intricate traffic interactions.

👉 More information
🗞 IRL-DAL: Safe and Adaptive Trajectory Planning for Autonomous Driving via Energy-Guided Diffusion Models
🧠 ArXiv: https://arxiv.org/abs/2601.23266

Rohail T.

Rohail T.

As a quantum scientist exploring the frontiers of physics and technology. My work focuses on uncovering how quantum mechanics, computing, and emerging technologies are transforming our understanding of reality. I share research-driven insights that make complex ideas in quantum science clear, engaging, and relevant to the modern world.

Latest Posts by Rohail T.:

Metricanything Achieves Scalable Depth Estimation Using 20M Noisy Image-Depth Pairs

Metricanything Achieves Scalable Depth Estimation Using 20M Noisy Image-Depth Pairs

February 3, 2026
Wadbert Achieves 99.63% Accuracy in Dual-Channel Web Attack Detection

Wadbert Achieves 99.63% Accuracy in Dual-Channel Web Attack Detection

February 3, 2026
Card Achieves 3x Faster Training with Novel Causal Autoregressive Diffusion

Card Achieves 3x Faster Training with Novel Causal Autoregressive Diffusion

February 3, 2026