NVIDIA has presented over 60 NVIDIA lectures, posters, and workshops at the NeurIPS 2022 conference. Two of their research papers, one on diffusion-based generative AI models and the other on training generalist AI agents, have won the NeurIPS 2022 Awards for their contributions to AI and machine learning.
Synthetic Data Production
Among the subjects covered in their work is synthetic data production, whether for photos, text, or video. Other subjects covered include reinforcement learning, data collection and augmentation, weather models, and federated learning. When compared to genuine data, synthetic data is a more economical solution.
Real car collision data, for example, will be more expensive to gather for an automotive company than fake data. And since they do not include any traceable information about the real data, synthetic data can benefit healthcare and pharmaceutical companies.
Reimagining The Design Of Diffusion-Based Generative Models
NVIDIA researchers were honored with an Outstanding Main Track Paper award for their work analyzing the design of diffusion models and proposing adjustments that can significantly increase the efficiency and quality of these models.
The idea of Diffusion Modelling is that if we can design a learning model for learning the systematic loss of information due to noise, we should be able to reverse the process and therefore recover the data from the noise.
Their study modularizes the components of a diffusion model, helping developers identify operations they can adjust to improve the model’s overall performance. NVIDIA researchers demonstrate that their adjustments allow them to record scores on a criterion that evaluates the quality of AI-generated photos.
Training Generalist AI Agents In A Minecraft-Based Simulation Suite
While academics have long trained autonomous AI agents in video games such as Starcraft, Dota, and Go, these agents are often specialists in only a few skills. So NVIDIA researchers went to Minecraft, the world’s most popular game, to create a scalable training framework for a generalist agent capable of performing a wide range of open-ended activities.
MineDojo allows an AI agent to learn Minecraft’s flexible gameplay by using a large online library of over 7,000 wiki pages, millions of Reddit discussions, and 300,000 hours of recorded playtime. The NeurIPS committee recognized the effort with an Outstanding Datasets and Benchmarks Paper Award.
As a proof of concept, the MineDojo researchers developed MineCLIP, a large-scale foundation model that learns to correlate YouTube footage of Minecraft gameplay with the video’s transcript, in which the user generally narrates the onscreen activity. The team was able to train a reinforcement learning agent that could tackle many tasks in Minecraft using MineCLIP without human involvement.
Creating Complex 3D Shapes To Populate Virtual Worlds
One of their models, GET3D, a generative AI model that synthesizes 3D forms based on the category of 2D photos it’s trained on, such as buildings or animals, is also on display at NeurIPS. The AI-generated objects have high-fidelity textures and sophisticated geometric details, and they are developed in a triangular mesh format, commonly used in graphics software programs. Users can easily import the forms into 3D renderers and game engines for additional tweaking.
“In generative AI, we are not only advancing our theoretical understanding of the underlying models but are also making practical contributions that will reduce the effort of creating realistic virtual worlds and simulations,” said Jan Kautz, vice president of learning and perception research at NVIDIA.
GET3D, so-called because it can Generate Explicit Textured 3D Meshes) was trained on NVIDIA A100 Tensor Core GPUs using around 1 million 2D photos of 3D structures taken from various camera angles. When inference is performed on a single NVIDIA GPU, the model can create around 20 objects per second.
The AI-generated objects can be used to populate 3D representations of buildings, outdoor areas, or entire cities – digital settings built for gaming, robotics, architecture, and social media sectors. The study, based on a more realistic shading model that takes advantage of NVIDIA RTX GPU-accelerated ray tracing, will also be presented as a poster at NeurIPS.
Improving The Factual Accuracy Of Language Model-Generated Text
Factual accuracy refers to the accuracy of the information presented in the text based on the training data the language model has been trained on. It is important to note that language models are not inherently factually accurate, as they simply generate text based on the patterns they have learned from the training data. It is up to the user to verify the accuracy of the information presented in the text.
This work looks at a crucial problem with pre-trained language models: the factual correctness of AI-generated writing.
In this work, NVIDIA researchers provide strategies to reduce this constraint, which is necessary before such models can be used in real-world applications. They created the first automated benchmark to assess the factual correctness of language models for open-ended text production.
While at it, they discovered that larger language models with billions of parameters were more accurate than smaller ones. The researchers presented a new approach, factuality-enhanced training, and a unique sampling algorithm, which help train language models to create precise text — and showed a drop in the proportion of factual mistakes from 33% to roughly 15%.