Intel Gaudi 2: The Prime Competitor to NV H100 for Powerful Generative AI

Intel showcased impressive MLCommons MLPerf v4.0 benchmark results with its Gaudi 2 AI accelerators and 5th Gen Intel Xeon Scalable processors, offering affordable and accessible AI solutions that rival Nvidia H100 in generative AI performance.

MLCommons is a leading organization in the field of machine learning, providing industry-standard benchmarks for evaluating AI performance.
Intel's MLPerf results demonstrate their commitment to innovation and continuous efforts to enhance AI performance across their portfolio of accelerators and CPUs.
The Intel Gaudi 2 AI accelerator stands as a strong alternative to Nvidia H100 for generative AI performance, providing customers with a viable choice in the market.

MLCommons, a leading organization in the field of machine learning, has released the results of the MLPerf v4.0 benchmark for inference. Intel, a key player in the AI industry, has showcased its impressive performance with Intel Gaudi 2 accelerators and 5th Gen Intel Xeon Scalable processors featuring Intel Advanced Matrix Extensions (Intel AMX). This reaffirms Intel’s commitment to making AI accessible everywhere by offering a wide range of competitive solutions.

One notable achievement is that the Intel Gaudi 2 AI accelerator stands as the only alternative to nVidia H100 when it comes to generative AI (GenAI) performance. Not only does it deliver strong performance-per-dollar, but it also provides customers with a viable choice in the market. Additionally, Intel is the sole server CPU vendor that has submitted MLPerf results, further highlighting its dedication to innovation.

The MLPerf results for Intel’s 5th Gen Xeon processors have shown an average improvement of 1.42x compared to the previous generation (4th Gen Intel Xeon processors) in MLPerf Inference v3.1. This demonstrates Intel’s continuous efforts to enhance AI performance across their portfolio of accelerators and CPUs. Zane Ball, Intel corporate vice president and general manager of DCAI Product Management, emphasized the significance of these results, stating, “Today’s results demonstrate that we are delivering AI solutions that cater to our customers’ dynamic and wide-ranging AI requirements. Both Intel Gaudi and Xeon products offer strong price-to-performance advantages and are ready for deployment.”

Intel’s MLPerf results serve as industry-standard benchmarks for evaluating AI performance. By providing customers with this standardized evaluation metric, Intel empowers them to make informed decisions about their AI infrastructure.

Let’s delve into the specifics of Intel’s accomplishments. The Intel Gaudi software suite has expanded its model coverage of popular large language models (LLMs) and multimodal models. For MLPerf Inference v4.0, Intel submitted Gaudi 2 accelerator results for state-of-the-art models such as Stable Diffusion XL and Llama v2-70B.

Due to high demand from customers, the Gaudi 2 accelerator’s Llama results were achieved using the Hugging Face Text Generation Inference (TGI) toolkit. This toolkit supports continuous batching and tensor parallelism, optimizing the efficiency of real-world LLM scaling. Impressively, Gaudi 2 achieved 8035.0 and 6287.5 offline and server tokens-per-second, respectively, for Llama v2-70B. On Stable Diffusion XL, Gaudi 2 delivered 6.26 and 6.25 offline samples-per-second and server queries-per-second, respectively. These results highlight the compelling price/performance ratio of Intel Gaudi 2, a crucial factor to consider when assessing the total cost of ownership (TCO).

Moving on to the Intel 5th Gen Xeon processors, significant improvements have been made in both hardware and software, resulting in a geomean improvement of 1.42x compared to the previous generation’s MLPerf Inference v3.1 results. For instance, the 5th Gen Xeon submission showcased approximately 1.8x performance gains for GPT-J, a language model, thanks to software optimizations like continuous batching. Similarly, DLRMv2 demonstrated about 1.8x performance gains and 99.9% accuracy due to various optimizations utilizing Intel AMX, including MergedEmbeddingBag.

Intel takes pride in its collaboration with OEM partners such as Cisco, Dell, Quanta, Supermicro, and Wiwynn, who have also submitted their MLPerf results. Furthermore, Intel has submitted MLPerf results for four generations of Xeon products since 2020, solidifying Xeon’s position as the host CPU for numerous accelerator submissions.

For those interested in exploring Intel’s AI solutions, the Intel Developer Cloud offers the opportunity to evaluate 5th Gen Xeon processors and Intel Gaudi 2 accelerators. This platform allows users to run small- and large-scale training and inference production workloads, manage AI compute resources, and more, providing a comprehensive environment for AI development and testing.

About Our Team

Our team comprises industry insiders with extensive experience in computers, semiconductors, games, and consumer electronics. With decades of collective experience, we’re committed to delivering timely, accurate, and engaging news content to our readers.

Background Information

About Dell: Dell is a globally technology leader providing comprehensive solutions in the field of hardware, software, and services. for its customizable computers and enterprise solutions, Dell offers a diverse range of laptops, desktops, servers, and networking equipment. With a commitment to innovation and customer satisfaction, Dell caters to a wide range of consumer and business needs, making it a important player in the tech industry.

About Intel: Intel Corporation, a global technology leader, is for its semiconductor innovations that power computing and communication devices worldwide. As a pioneer in microprocessor technology, Intel has left an indelible mark on the evolution of computing with its processors that drive everything from PCs to data centers and beyond. With a history of advancements, Intel's relentless pursuit of innovation continues to shape the digital landscape, offering solutions that empower businesses and individuals to achieve new levels of productivity and connectivity.

About nVidia: NVIDIA has firmly established itself as a leader in the realm of client computing, continuously pushing the boundaries of innovation in graphics and AI technologies. With a deep commitment to enhancing user experiences, NVIDIA's client computing business focuses on delivering solutions that power everything from gaming and creative workloads to enterprise applications. for its GeForce graphics cards, the company has redefined high-performance gaming, setting industry standards for realistic visuals, fluid frame rates, and immersive experiences. Complementing its gaming expertise, NVIDIA's Quadro and NVIDIA RTX graphics cards cater to professionals in design, content creation, and scientific fields, enabling real-time ray tracing and AI-driven workflows that elevate productivity and creativity to unprecedented heights. By seamlessly integrating graphics, AI, and software, NVIDIA continues to shape the landscape of client computing, fostering innovation and immersive interactions in a rapidly evolving digital world.

About Supermicro: Supermicro is a reputable American technology company founded in 1993 and headquartered in San Jose, California. Specializing in high-performance server and storage solutions, Supermicro has become a trusted name in the data center industry. The company offers a wide range of innovative and customizable server hardware, including motherboards, servers, storage systems, and networking equipment, catering to the needs of enterprise clients, cloud service providers, and businesses seeking reliable infrastructure solutions.

Technology Explained

CPU: The Central Processing Unit (CPU) is the brain of a computer, responsible for executing instructions and performing calculations. It is the most important component of a computer system, as it is responsible for controlling all other components. CPUs are used in a wide range of applications, from desktop computers to mobile devices, gaming consoles, and even supercomputers. CPUs are used to process data, execute instructions, and control the flow of information within a computer system. They are also used to control the input and output of data, as well as to store and retrieve data from memory. CPUs are essential for the functioning of any computer system, and their applications in the computer industry are vast.

LLM: A Large Language Model (LLM) is a highly advanced artificial intelligence system, often based on complex architectures like GPT-3.5, designed to comprehend and produce human-like text on a massive scale. LLMs possess exceptional capabilities in various natural language understanding and generation tasks, including answering questions, generating creative content, and delivering context-aware responses to textual inputs. These models undergo extensive training on vast datasets to grasp the nuances of language, making them invaluable tools for applications like chatbots, content generation, and language translation.

Stable Diffusion: Stable Diffusion is a technology that is used to improve the performance of computer systems. It is a process of spreading out the load of a system across multiple processors or cores. This helps to reduce the amount of time it takes for a system to complete a task, as well as reduce the amount of energy used. Stable Diffusion is used in many areas of the computer industry, such as in cloud computing, distributed computing, and high-performance computing. It is also used in gaming, where it can help to reduce the amount of time it takes for a game to load. Stable Diffusion is also used in artificial intelligence, where it can help to improve the accuracy of machine learning algorithms.

Xeon: The Intel Xeon processor is a powerful and reliable processor used in many computer systems. It is a multi-core processor that is designed to handle multiple tasks simultaneously. It is used in servers, workstations, and high-end desktop computers. It is also used in many embedded systems, such as routers and switches. The Xeon processor is known for its high performance and scalability, making it a popular choice for many computer applications. It is also used in many cloud computing applications, as it is capable of handling large amounts of data and providing high levels of performance. The Xeon processor is also used in many scientific and engineering applications, as it is capable of handling complex calculations and simulations.