Google and NVIDIA have collaborated to optimize Google’s new open language models, Gemma, across all NVIDIA AI platforms. Gemma, which can be run anywhere, is designed to reduce costs and speed up work for specific use cases. The models are accelerated by NVIDIA’s TensorRT-LLM, an open-source library for optimizing large language model inference. This allows developers to target over 100 million NVIDIA RTX GPUs globally. Gemma can also run on NVIDIA GPUs in the cloud, including on Google Cloud’s A3 instances. NVIDIA’s H200 Tensor Core GPUs, which Google will deploy this year, will soon support Gemma.

Google’s Gemma Optimized for NVIDIA GPUs: A Collaborative Effort

NVIDIA and Google have recently announced a joint effort to optimize Google’s new open language models, Gemma, across all NVIDIA AI platforms. Gemma, a state-of-the-art lightweight language model with 2 billion and 7 billion parameters, can be run anywhere, thus reducing costs and accelerating innovative work for domain-specific use cases. The collaboration aims to enhance the performance of Gemma when running on NVIDIA GPUs in various environments, including data centers, the cloud, and local workstations with NVIDIA RTX GPUs or PCs with GeForce RTX GPUs.

The optimization process was facilitated by NVIDIA TensorRT-LLM, an open-source library specifically designed for optimizing large language model inference. This collaboration allows developers to target the installed base of over 100 million NVIDIA RTX GPUs available in high-performance AI PCs globally.

Gemma on NVIDIA GPUs: Cloud and Local Applications

Developers can run Gemma on NVIDIA GPUs not only locally but also in the cloud. This includes Google Cloud’s A3 instances based on the H100 Tensor Core GPU and soon, NVIDIA’s H200 Tensor Core GPUs. The latter features 141GB of HBM3e memory at 4.8 terabytes per second and is set to be deployed by Google within the year.

Running Gemma locally provides several advantages. For instance, it allows for faster results as the model runs directly on the device. Moreover, it ensures user data privacy as the data stays on the device and does not need to be shared with a third party or require an internet connection.

NVIDIA’s Ecosystem of Tools: Enhancing Gemma’s Performance

Enterprise developers can further enhance Gemma’s performance by leveraging NVIDIA’s rich ecosystem of tools. This includes NVIDIA AI Enterprise with the NeMo framework and TensorRT-LLM, which can be used to fine-tune Gemma and deploy the optimized model in their production applications.

Additional information for developers, including several model checkpoints of Gemma and the FP8-quantized version of the model, all optimized with TensorRT-LLM, is available to help rev up inference for Gemma.

Gemma and Chat with RTX: A New User Experience

NVIDIA is also planning to add support for Gemma to its tech demo, Chat with RTX. This application uses retrieval-augmented generation and TensorRT-LLM software to provide users with generative AI capabilities on their local, RTX-powered Windows PCs.

Chat with RTX allows users to personalize a chatbot with their own data by easily connecting local files on an RTX PC to a large language model. This feature, combined with the local running of the model, provides a fast, secure, and personalized user experience. Users can experience Gemma 2B and Gemma 7B directly from their browser on the NVIDIA AI Playground.

More information
External Link: Click Here For More

Tags:

AI platforms Chat with RTX GeForce RTX GPUs Gemma Google Google Cloud’s A3 instances H200 Tensor Core GPUs NVIDIA RTX GPUs TensorRT-LLM

Quantum News

Google’s Gemma Language Models Supercharged on NVIDIA GPUs, Boosting AI Performance Globally

Google’s Gemma Optimized for NVIDIA GPUs: A Collaborative Effort

Gemma on NVIDIA GPUs: Cloud and Local Applications

NVIDIA’s Ecosystem of Tools: Enhancing Gemma’s Performance

Gemma and Chat with RTX: A New User Experience

Latest Posts by Quantum News:

Michael Saylor says Quantum Computing Remains A Risk To The Digital World

UK Research and Innovation Supports Rollout of Large-Scale Quantum Computers

CERN’s LHCb Experiment Discovers New Particle with Two Charm Quarks