Google’s Gemma optimized for NVIDIA GPUs, bringing Gemma to chat with RTX.


February 22, 2024 by our News Team

NVIDIA and Google have joined forces to optimize Gemma, a lightweight language model, for efficient use on NVIDIA AI platforms, including in the cloud and on local devices, enabling developers to tap into the vast installed base of NVIDIA GPUs and accelerate innovation in natural language processing.

  • The collaboration between NVIDIA and Google brings Gemma's impressive language models to the forefront, making them more accessible and efficient across various AI platforms.
  • The use of NVIDIA TensorRT-LLM allows for optimized large language model inference on NVIDIA GPUs, making it more accessible and cost-effective for developers.
  • The availability of Gemma on NVIDIA GPUs in the cloud, including Google Cloud's A3 instances and future H200 Tensor Core GPUs, allows for high-performance AI computing and opens up new possibilities for cloud-based AI applications.


Today, nVidia and Google announced a collaboration that brings exciting optimizations to Gemma, Google’s lightweight language models. Gemma, with its impressive 2 billion- and 7 billion-parameter capacity, can now be run efficiently on NVIDIA AI platforms, making it more accessible and cost-effective for various domain-specific use cases.

The teams at NVIDIA and Google have worked closely together to enhance the performance of Gemma using NVIDIA TensorRT-LLM. This open-source library optimizes large language model inference when running on NVIDIA GPUs, whether it’s in data centers, the cloud, or on PCs equipped with NVIDIA RTX GPUs. This means that developers can tap into the vast installed base of over 100 million NVIDIA RTX GPUs worldwide, enabling high-performance AI computing.

Moreover, Gemma can also be deployed on NVIDIA GPUs in the cloud, including Google Cloud’s A3 instances powered by the H100 Tensor Core GPU. In the near future, developers will have access to NVIDIA’s H200 Tensor Core GPUs, boasting an impressive 141 GB of HBM3E memory at 4.8 terabytes per second. With these advancements, Gemma is set to make waves in cloud-based AI applications.

For enterprise developers, NVIDIA offers a rich ecosystem of tools, including NVIDIA AI Enterprise with the NeMo framework and TensorRT-LLM. These resources allow developers to fine-tune Gemma and seamlessly integrate the optimized model into their production applications.

To learn more about how TensorRT-LLM is revolutionizing inference for Gemma, developers can explore additional information and access various model checkpoints of Gemma, including the FP8-quantized version, all of which have been optimized with TensorRT-LLM.

Excitingly, Gemma is also making its way to Chat with RTX, an NVIDIA tech demo that empowers users with generative AI capabilities on their local Windows PCs powered by RTX. By leveraging retrieval-augmented generation and TensorRT-LLM software, Chat with RTX allows users to personalize a chatbot using their own data from local files. This local processing ensures fast results while keeping user data secure on the device. With Chat with RTX, users can harness the power of Gemma without relying on cloud-based LLM services or sharing sensitive information with third parties.

The collaboration between NVIDIA and Google brings Gemma’s remarkable language models to the forefront, making them more accessible and efficient across various AI platforms. With these optimizations, developers and users alike can unlock new possibilities and accelerate innovation in the realm of natural language processing.

Google’s Gemma optimized for NVIDIA GPUs, bringing Gemma to chat with RTX.

About Our Team

Our team comprises industry insiders with extensive experience in computers, semiconductors, games, and consumer electronics. With decades of collective experience, we’re committed to delivering timely, accurate, and engaging news content to our readers.

Background Information


About Google: Google, founded by Larry Page and Sergey Brin in 1998, is a multinational technology company known for its internet-related services and products. Initially for its search engine, Google has since expanded into various domains including online advertising, cloud computing, software development, and hardware devices. With its innovative approach, Google has introduced influential products such as Google Search, Android OS, Google Maps, and Google Drive. The company's commitment to research and development has led to advancements in artificial intelligence and machine learning.

Google website  Google LinkedIn

About nVidia: NVIDIA has firmly established itself as a leader in the realm of client computing, continuously pushing the boundaries of innovation in graphics and AI technologies. With a deep commitment to enhancing user experiences, NVIDIA's client computing business focuses on delivering solutions that power everything from gaming and creative workloads to enterprise applications. for its GeForce graphics cards, the company has redefined high-performance gaming, setting industry standards for realistic visuals, fluid frame rates, and immersive experiences. Complementing its gaming expertise, NVIDIA's Quadro and NVIDIA RTX graphics cards cater to professionals in design, content creation, and scientific fields, enabling real-time ray tracing and AI-driven workflows that elevate productivity and creativity to unprecedented heights. By seamlessly integrating graphics, AI, and software, NVIDIA continues to shape the landscape of client computing, fostering innovation and immersive interactions in a rapidly evolving digital world.

nVidia website  nVidia LinkedIn

Technology Explained


GPU: GPU stands for Graphics Processing Unit and is a specialized type of processor designed to handle graphics-intensive tasks. It is used in the computer industry to render images, videos, and 3D graphics. GPUs are used in gaming consoles, PCs, and mobile devices to provide a smooth and immersive gaming experience. They are also used in the medical field to create 3D models of organs and tissues, and in the automotive industry to create virtual prototypes of cars. GPUs are also used in the field of artificial intelligence to process large amounts of data and create complex models. GPUs are becoming increasingly important in the computer industry as they are able to process large amounts of data quickly and efficiently.


HBM3E: HBM3E is the latest generation of high-bandwidth memory (HBM), a type of DRAM that is designed for artificial intelligence (AI) applications. HBM3E offers faster data transfer rates, higher density, and lower power consumption than previous HBM versions. HBM3E is developed by SK Hynix, a South Korean chipmaker, and is expected to enter mass production in 2024. HBM3E can achieve a speed of 1.15 TB/s and a capacity of 64 GB per stack. HBM3E is suitable for AI systems that require large amounts of data processing, such as deep learning, machine learning, and computer vision.


LLM: A Large Language Model (LLM) is a highly advanced artificial intelligence system, often based on complex architectures like GPT-3.5, designed to comprehend and produce human-like text on a massive scale. LLMs possess exceptional capabilities in various natural language understanding and generation tasks, including answering questions, generating creative content, and delivering context-aware responses to textual inputs. These models undergo extensive training on vast datasets to grasp the nuances of language, making them invaluable tools for applications like chatbots, content generation, and language translation.





Leave a Reply