MIT researchers have developed an efficient approach for training more reliable reinforcement learning models, focusing on complex tasks that involve variability. The technique could make AI systems better at tasks such as intelligently controlling traffic in congested cities, improving safety and sustainability.
Led by senior author Cathy Wu, the Thomas D. and Virginia W. Cabot Career Development Associate Professor in Civil and Environmental Engineering, the team introduced an algorithm that strategically selects the best tasks for training an AI agent to effectively perform all tasks in a collection of related tasks. By focusing on a smaller number of tasks that contribute the most to the algorithm’s overall effectiveness, this method maximizes performance while keeping the training cost low.
The researchers found that their technique was between five and 50 times more efficient than standard approaches on an array of simulated tasks. The research will be presented at the Conference on Neural Information Processing Systems.
Efficient Training of Reliable AI Agents for Complex Tasks
Researchers at MIT have developed an efficient approach to train more reliable reinforcement learning models, focusing on complex tasks that involve variability. This technique has the potential to make AI systems better at performing tasks such as controlling traffic in congested cities, improving safety and sustainability.
The Challenge of Training AI Systems
Teaching an AI system to make good decisions is a difficult task. Reinforcement learning models, which underlie these AI decision-making systems, often fail when faced with even small variations in the tasks they are trained to perform. For instance, a model might struggle to control a set of intersections with different speed limits, numbers of lanes, or traffic patterns.
A More Efficient Algorithm for Training AI Agents
To boost the reliability of reinforcement learning models for complex tasks with variability, MIT researchers have introduced a more efficient algorithm for training them. The algorithm strategically selects the best tasks for training an AI agent so it can effectively perform all tasks in a collection of related tasks. By focusing on a smaller number of intersections that contribute the most to the algorithm’s overall effectiveness, this method maximizes performance while keeping the training cost low.
Finding a Middle Ground
To train an algorithm to control traffic lights at many intersections in a city, an engineer would typically choose between two main approaches. One approach is to train one algorithm for each intersection independently, using only that intersection’s data. The other approach is to train a larger algorithm using data from all intersections and then apply it to each one. However, both approaches have their downsides. Training a separate algorithm for each task requires an enormous amount of data and computation, while training one algorithm for all tasks often leads to subpar performance.
Model-Based Transfer Learning (MBTL)
The researchers sought a sweet spot between these two approaches. They developed an algorithm called Model-Based Transfer Learning (MBTL), which chooses a subset of tasks and trains one algorithm for each task independently. MBTL leverages a common trick from the reinforcement learning field called zero-shot transfer learning, in which an already trained model is applied to a new task without being further trained.
The MBTL algorithm has two pieces. It models how well each algorithm would perform if it were trained independently on one task. Then it models how much each algorithm’s performance would degrade if it were transferred to each other task, a concept known as generalization performance. Explicitly modeling generalization performance allows MBTL to estimate the value of training on a new task.
Reducing Training Costs
When the researchers tested this technique on simulated tasks, including controlling traffic signals, managing real-time speed advisories, and executing several classic control tasks, it was five to 50 times more efficient than other methods. This means they could arrive at the same solution by training on far less data. For instance, with a 50x efficiency boost, the MBTL algorithm could train on just two tasks and achieve the same performance as a standard method which uses data from 100 tasks.
Future Directions
In the future, the researchers plan to design MBTL algorithms that can extend to more complex problems, such as high-dimensional task spaces. They are also interested in applying their approach to real-world problems, especially in next-generation mobility systems. The research is funded, in part, by a National Science Foundation CAREER Award, the Kwanjeong Educational Foundation PhD Scholarship Program, and an Amazon Robotics PhD Fellowship.
External Link: Click Here For More
