Researchers at MIT have developed a new technique to train general-purpose robots, inspired by large language models like GPT-4. The method combines diverse data from multiple sources into one system that can teach any robot a wide range of tasks.
This approach can be faster and less expensive than traditional techniques because it requires far fewer task-specific data. Led by Lirui Wang, an electrical engineering and computer science graduate student, the team developed a new architecture called Heterogeneous Pretrained Transformers (HPT) that unifies data from varied modalities and domains.
The researchers drew inspiration from large language models like GPT-4, which are pretrained using an enormous amount of diverse language data and then fine-tuned by feeding them a small amount of task-specific data. Companies involved in the work include Amazon and Toyota Research Institute. The technique has shown promising results, outperforming training from scratch by more than 20 percent in simulation and real-world experiments.
A Novel Approach to Training General-Purpose Robots
Inspired by large language models, researchers at MIT have developed a training technique that pools diverse data to teach robots new skills. This approach has the potential to revolutionize the field of robotics by enabling robots to learn from a vast amount of data and adapt to new tasks and environments with ease.
The Limitations of Current Robotic Policies
Current robotic policies are typically trained using imitation learning, where a human demonstrates actions or teleoperates a robot to generate data. However, this method has several limitations. Firstly, it requires a large amount of task-specific data, which can be time-consuming and expensive to collect. Secondly, robots often fail when their environment or task changes, as they are not able to adapt to new situations.
Drawing Inspiration from Large Language Models
To develop a better approach, the researchers drew inspiration from large language models like GPT-4. These models are pretrained using an enormous amount of diverse language data and then fine-tuned by feeding them a small amount of task-specific data. This pretraining on a vast amount of data helps the models adapt to perform well on a variety of tasks.
The MIT researchers developed a new architecture called Heterogeneous Pretrained Transformers (HPT) that unifies data from varied modalities and domains. They put a machine-learning model known as a transformer into the middle of their architecture, which processes vision and proprioception inputs. A transformer is the same type of model that forms the backbone of large language models.
The researchers align data from vision and proprioception into the same type of input, called a token, which the transformer can process. Each input is represented with the same fixed number of tokens. Then the transformer maps all inputs into one shared space, growing into a huge, pretrained model as it processes and learns from more data.
Enabling Dexterous Motions
One of the biggest challenges of developing HPT was building the massive dataset to pretrain the transformer, which included 52 datasets with more than 200,000 robot trajectories in four categories. The researchers also needed to develop an efficient way to turn raw proprioception signals from an array of sensors into data the transformer could handle.
Results and Future Directions
When they tested HPT, it improved robot performance by more than 20 percent on simulation and real-world tasks, compared with training from scratch each time. Even when the task was very different from the pretraining data, HPT still improved performance.
In the future, the researchers want to study how data diversity could boost the performance of HPT. They also want to enhance HPT so it can process unlabeled data like GPT-4 and other large language models. The ultimate goal is to have a universal robot brain that you could download and use for your robot without any training at all.
This work was funded, in part, by the Amazon Greater Boston Tech Initiative and the Toyota Research Institute.
External Link: Click Here For More
