Intel’s Gaudi AI Accelerator Surpasses GPT-3 with Impressive 2x Performance Boost

Intel's commitment to democratizing AI is evident in its latest MLPerf training v3.1 benchmark results, showcasing its AI expertise with Intel Gaudi2 accelerators and 4th Gen Intel Xeon Scalable processors, delivering a remarkable 2x performance improvement with the FP8 data type.

Intel Gaudi2 accelerators and 4th Gen Intel Xeon Scalable processors featuring Intel Advanced Matrix Extensions (Intel AMX) deliver 2x performance improvement
Intel Gaudi2 offers significant price-performance advantages compared to NVIDIA's H100 for AI compute requirements
Intel 4th Gen Xeon processors offer efficient and cost-effective training of small to mid-sized deep learning models using existing IT infrastructure

In the latest MLPerf training v3.1 benchmark results released by MLCommons, Intel has showcased its AI expertise with its Intel Gaudi 2 accelerators and 4th Gen Intel Xeon Scalable processors featuring Intel Advanced Matrix Extensions (Intel AMX). The highlight of the benchmark submissions was the remarkable 2x performance improvement delivered by Intel Gaudi2 when employing the FP8 data type on the v3.1 training GPT-3 benchmark.

Intel’s commitment to democratizing AI is evident through its continuous innovation in the AI portfolio and its consistent top-notch performance in consecutive MLCommons AI benchmarks. Sandra Rivera, Intel’s executive vice president and general manager of the Data Center and AI Group, emphasized the significant price-performance advantage offered by Intel Gaudi and 4th Gen Xeon processors, making them highly deployable solutions for customers. Intel’s comprehensive range of AI hardware and software configurations caters to diverse AI workloads, ensuring customers have tailored options to meet their specific requirements.

These latest MLCommons MLPerf results further solidify Intel’s reputation for delivering exceptional AI performance, building upon their previous achievements in MLPerf training benchmarks from June. Notably, Intel Xeon processors remain the only CPUs reporting MLPerf results, while Intel Gaudi2 stands as one of the three accelerator solutions forming the basis of these results, with only two of them commercially available.

The versatility of Intel Gaudi2 and 4th Gen Xeon processors is evident in their compelling AI training performance across various hardware configurations, addressing the increasingly diverse array of customer AI compute needs. In fact, Gaudi2 emerges as a strong contender to nVidia’s H100 for AI compute requirements, offering significant price-performance advantages. The MLPerf results for Gaudi2 demonstrate its growing training performance, with a 2x leap achieved through the implementation of the FP8 data type on the v3.1 training GPT-3 benchmark. This substantial reduction in time-to-train, compared to the June MLPerf benchmark, allowed the completion of training in just 153.58 minutes using 384 Intel Gaudi2 accelerators. The Gaudi2 accelerator supports FP8 in both E5M2 and E4M3 formats, with the option of delayed scaling when necessary.

Furthermore, Intel Gaudi2 showcased its expertise by training the Stable Diffusion multi-modal model in a mere 20.2 minutes using BF16 and 64 accelerators. Future MLPerf training benchmarks will see Stable Diffusion performance submitted on the FP8 data type. The benchmark results for BERT and ResNet-50 on eight Intel Gaudi2 accelerators were equally impressive, with completion times of 13.27 and 15.92 minutes, respectively, using BF16.

Intel continues to stand out as the sole CPU vendor submitting MLPerf results, further highlighting the strength of their products. The MLPerf results for 4th Gen Xeon processors showcased their exceptional performance across multiple models, including RESNet50, RetinaNet, BERT, and DLRM dcnv2. The performance achieved by 4th Gen Intel Xeon scalable processors on ResNet50, RetinaNet, and BERT was on par with the outstanding out-of-box performance results submitted in the June 2023 MLPerf benchmark. Notably, the CPU demonstrated a time-to-train submission of 227 minutes using only four nodes for the new model, DLRM dcnv2.

The performance exhibited by 4th Gen Xeon processors proves that many enterprise organizations can efficiently and cost-effectively train small to mid-sized deep learning models using their existing IT infrastructure equipped with general-purpose CPUs. This is particularly beneficial for use cases where training is an intermittent workload.

Looking ahead, Intel anticipates further advancements in AI performance results in upcoming MLPerf benchmarks through software updates and optimizations. Intel’s wide range of AI products offers customers an extensive selection of solutions to meet their dynamic requirements, encompassing performance, efficiency, and usability.

About Our Team

Our team comprises industry insiders with extensive experience in computers, semiconductors, games, and consumer electronics. With decades of collective experience, we’re committed to delivering timely, accurate, and engaging news content to our readers.

Background Information

About Intel:

Intel Corporation, a global technology leader, is for its semiconductor innovations that power computing and communication devices worldwide. As a pioneer in microprocessor technology, Intel has left an indelible mark on the evolution of computing with its processors that drive everything from PCs to data centers and beyond. With a history of advancements, Intel's relentless pursuit of innovation continues to shape the digital landscape, offering solutions that empower businesses and individuals to achieve new levels of productivity and connectivity.

Latest Articles about Intel

About nVidia:

NVIDIA has firmly established itself as a leader in the realm of client computing, continuously pushing the boundaries of innovation in graphics and AI technologies. With a deep commitment to enhancing user experiences, NVIDIA's client computing business focuses on delivering solutions that power everything from gaming and creative workloads to enterprise applications. for its GeForce graphics cards, the company has redefined high-performance gaming, setting industry standards for realistic visuals, fluid frame rates, and immersive experiences. Complementing its gaming expertise, NVIDIA's Quadro and NVIDIA RTX graphics cards cater to professionals in design, content creation, and scientific fields, enabling real-time ray tracing and AI-driven workflows that elevate productivity and creativity to unprecedented heights. By seamlessly integrating graphics, AI, and software, NVIDIA continues to shape the landscape of client computing, fostering innovation and immersive interactions in a rapidly evolving digital world.

Latest Articles about nVidia

Technology Explained

CPU: The Central Processing Unit (CPU) is the brain of a computer, responsible for executing instructions and performing calculations. It is the most important component of a computer system, as it is responsible for controlling all other components. CPUs are used in a wide range of applications, from desktop computers to mobile devices, gaming consoles, and even supercomputers. CPUs are used to process data, execute instructions, and control the flow of information within a computer system. They are also used to control the input and output of data, as well as to store and retrieve data from memory. CPUs are essential for the functioning of any computer system, and their applications in the computer industry are vast.

Latest Articles about CPU

Stable Diffusion: Stable Diffusion is a technology that is used to improve the performance of computer systems. It is a process of spreading out the load of a system across multiple processors or cores. This helps to reduce the amount of time it takes for a system to complete a task, as well as reduce the amount of energy used. Stable Diffusion is used in many areas of the computer industry, such as in cloud computing, distributed computing, and high-performance computing. It is also used in gaming, where it can help to reduce the amount of time it takes for a game to load. Stable Diffusion is also used in artificial intelligence, where it can help to improve the accuracy of machine learning algorithms.

Latest Articles about Stable Diffusion

Xeon: The Intel Xeon processor is a powerful and reliable processor used in many computer systems. It is a multi-core processor that is designed to handle multiple tasks simultaneously. It is used in servers, workstations, and high-end desktop computers. It is also used in many embedded systems, such as routers and switches. The Xeon processor is known for its high performance and scalability, making it a popular choice for many computer applications. It is also used in many cloud computing applications, as it is capable of handling large amounts of data and providing high levels of performance. The Xeon processor is also used in many scientific and engineering applications, as it is capable of handling complex calculations and simulations.