NVIDIA Teams Up with Anyscale, Paving the Way for Exciting Collaborative Innovations

September 18, 2023

NVIDIA and Anyscale are joining forces to accelerate the development of large language models (LLMs) to supersonic speeds, offering developers speed, savings, and efficiency for generative AI development and deployment through various application integrations.

  • Integration of NVIDIA AI into Ray open source and the Anyscale Platform
  • Significant improvements in generative AI development and efficiency
  • NVIDIA TensorRT-LLM open-source software enhances LLM performance and efficiency

In a collaboration, nVidia and Anyscale are joining forces to accelerate the development of large language models (LLMs) to supersonic speeds. At the Ray Summit developers conference, Anyscale, known for its open-source unified compute framework, announced that it will integrate NVIDIA AI into Ray open source and the Anyscale Platform. This integration will also extend to Anyscale Endpoints, a new service that simplifies the embedding of LLMs in applications using popular open source models.

By combining NVIDIA AI with Ray and the Anyscale Platform, developers can expect significant improvements in generative AI development and efficiency, as well as enhanced security for production AI. These integrations support a wide range of LLM models, from proprietary ones to open models like Code Llama, Falcon, Llama 2, SDXL, and more. Developers have the flexibility to deploy open-source NVIDIA software with Ray or choose NVIDIA AI Enterprise software on the Anyscale Platform for a fully supported and secure production deployment.

Ray and the Anyscale Platform are widely adopted by developers working on advanced LLMs for generative AI applications, such as intelligent chatbots, coding copilots, and powerful search and summarization tools. The collaboration between NVIDIA and Anyscale aims to deliver speed, savings, and efficiency for generative AI development and deployment through various application integrations.

One of the key products is NVIDIA TensorRT-LLM, an open-source software that enhances LLM performance and efficiency, resulting in cost savings. TensorRT-LLM enables parallel inference across multiple GPUs, delivering up to 8 times higher performance on NVIDIA H100 Tensor Core GPUs compared to previous-generation GPUs. It also supports custom GPU kernels and optimizations for popular LLM models and features an easy-to-use Python interface.

Additionally, the integration of NVIDIA Triton Inference Server software allows Ray developers to improve efficiency when deploying AI models across different frameworks. Triton Inference Server supports inference on GPUs, CPUs, and other processors in various environments, including cloud, data centers, edge devices, and embedded systems.

The collaboration also brings the NVIDIA NeMo framework to Ray users, enabling easy fine-tuning and customization of LLMs with business data. NeMo is a cloud-native framework that offers end-to-end solutions for building, customizing, and deploying generative AI models anywhere. It includes training and inferencing frameworks, guardrailing toolkits, data curation tools, and pretrained models, providing enterprises with a cost-effective and efficient way to adopt generative AI.

Developers using Ray open source or the Anyscale Platform can seamlessly transition from open-source development to deploying production AI at scale in the cloud. The Anyscale Platform offers fully managed, enterprise-ready unified computing, simplifying the building, deployment, and management of scalable AI and Python applications using Ray. This helps customers bring AI products to market faster and at a lower cost.

Regardless of the chosen platform, developers can easily orchestrate LLM workloads with Anyscale’s core functionality. The NVIDIA AI integration further enhances developers’ ability to build, train, tune, and scale AI models with increased efficiency.

Ray and the Anyscale Platform are compatible with accelerated computing from leading cloud providers, allowing developers to scale up their computing resources as needed for successful LLM deployments. Moreover, developers can start building models on their workstations using NVIDIA AI Workbench and seamlessly scale them across hybrid or multi-cloud accelerated computing when transitioning to production.

The NVIDIA AI integrations with Anyscale are currently in development and expected to be available by the end of the year. Developers can stay updated on this integration and also take advantage of a free 90-day evaluation of NVIDIA AI Enterprise by signing up for the latest news.

For those interested in learning more about this collaboration, attending the Ray Summit in SAN Francisco this week or watching the demo video below will provide valuable insights.

NVIDIA Teams Up with Anyscale, Paving the Way for Exciting Collaborative Innovations

NVIDIA Teams Up with Anyscale, Paving the Way for Exciting Collaborative Innovations

NVIDIA Teams Up with Anyscale, Paving the Way for Exciting Collaborative Innovations


Background Information

About nVidia: NVIDIA has firmly established itself as a leader in the realm of client computing, continuously pushing the boundaries of innovation in graphics and AI technologies. With a deep commitment to enhancing user experiences, NVIDIA's client computing business focuses on delivering solutions that power everything from gaming and creative workloads to enterprise applications. for its GeForce graphics cards, the company has redefined high-performance gaming, setting industry standards for realistic visuals, fluid frame rates, and immersive experiences. Complementing its gaming expertise, NVIDIA's Quadro and NVIDIA RTX graphics cards cater to professionals in design, content creation, and scientific fields, enabling real-time ray tracing and AI-driven workflows that elevate productivity and creativity to unprecedented heights. By seamlessly integrating graphics, AI, and software, NVIDIA continues to shape the landscape of client computing, fostering innovation and immersive interactions in a rapidly evolving digital world.

nVidia website  nVidia LinkedIn

Technology Explained

GPU: GPU stands for Graphics Processing Unit and is a specialized type of processor designed to handle graphics-intensive tasks. It is used in the computer industry to render images, videos, and 3D graphics. GPUs are used in gaming consoles, PCs, and mobile devices to provide a smooth and immersive gaming experience. They are also used in the medical field to create 3D models of organs and tissues, and in the automotive industry to create virtual prototypes of cars. GPUs are also used in the field of artificial intelligence to process large amounts of data and create complex models. GPUs are becoming increasingly important in the computer industry as they are able to process large amounts of data quickly and efficiently.

SAN: A Storage Area Network (SAN) is a high-speed and specialized network architecture designed to facilitate the connection of storage devices, such as disk arrays and tape libraries, to servers. Unlike traditional network-attached storage (NAS), which is file-based, SAN operates at the block level, enabling direct access to storage resources. SANs are known for their performance, scalability, and flexibility, making them ideal for data-intensive applications, large enterprises, and environments requiring high availability. SANs typically employ Fibre Channel or iSCSI protocols to establish dedicated and fast communication paths between servers and storage devices. With features like centralized management, efficient data replication, and snapshot capabilities, SANs offer advanced data storage, protection, and management options. Overall, SAN technology has revolutionized data storage and management, enabling organizations to efficiently handle complex storage requirements and ensure reliable data access.

Leave a Reply