Intel Unveils Gaudi 3 AI Accelerator: Scaling Up and Setting Sights on AI Market Domination

Intel introduces the Gaudi 3 AI accelerator at Vision 2024 conference, boasting 1835 TFLOPS of FP8 compute throughput and potential to outperform NVIDIA's Hx00 Hopper accelerators in certain large language models.

Impressive performance with 1835 TFLOPS of FP8 compute throughput
Outperforms NVIDIA's flagship Hx00 Hopper architecture accelerators in certain large language models
Scalable solution for large AI clusters with improved networking capabilities

Intel is making waves at their Vision 2024 conference with the introduction of their latest AI accelerator, the Gaudi 3. While Intel typically saves major silicon announcements for their Innovation event in the fall, this year’s Vision conference is not without its surprises. With a strong focus on AI across the industry, Intel is seizing the opportunity to unveil the next-generation Gaudi accelerator from their subsidiary, Habana Labs.

Set to launch in the third quarter of 2024, Intel is already sending out samples of the Gaudi 3 to customers. The hardware boasts an impressive 1835 TFLOPS of FP8 compute throughput, positioning it as a formidable competitor in the AI market. Intel’s internal benchmarks indicate that the Gaudi 3 has the potential to outperform nVidia’s flagship Hx00 Hopper architecture accelerators in certain large language models. This presents a significant opportunity for Intel to gain a larger share of the AI accelerator market, especially at a time when NVIDIA hardware is in high demand.

What’s interesting about the Gaudi 3 launch is that it marks a shift in how Intel is positioning its AI accelerator products. Previously overshadowed by Intel’s GPU Data Center Max products, Habana Labs and the Gaudi lineup have now taken center stage following the cancellation of Rialto Bridge. With no other new AI accelerator silicon on the horizon, Intel is going all-in with the Gaudi 3.

In terms of hardware, the Gaudi 3 is an evolution of its predecessor, the Gaudi 2. While there aren’t many new features to discuss, the Gaudi 3 benefits from being built on TSMC’s newer 5nm process and has seen an increase in computational hardware. The number of Matrix Math Engines has doubled from 2 to 4, and Tensor Cores have increased from 24 to 32. These improvements are expected to enhance performance, although specific clockspeeds have not been disclosed.

One notable aspect of the Gaudi 3 is its use of HBM2e memory, which may seem outdated compared to newer memory types. However, Intel has managed to pack 128GB of memory into the accelerator, offering a total memory bandwidth of 3.7TB/second. Additionally, each Gaudi 3 die includes 48MB of SRAM, resulting in a combined 96MB of SRAM for the full chip.

In terms of power consumption, the Gaudi 3 accelerator has a TDP of 900W, a 50% increase over its predecessor. Intel is also developing liquid-cooled versions of the Gaudi 3, which will offer higher performance at the cost of even higher TDPs. All versions of the Gaudi 3 will utilize PCIe backhaul for connectivity.

One area where Habana Labs has made significant improvements with the Gaudi 3 is in networking. The accelerator now features 200Gb Ethernet links, doubling the bandwidth of its predecessor. This upgrade allows for more efficient on-node and node-to-node connectivity, making the Gaudi 3 a scalable solution for large AI clusters.

Intel is confident in the performance of the Gaudi 3 and has provided benchmark-based figures to support their claims. They project that the Gaudi 3 will outperform the H100 accelerator by up to 1.7x in training Llama2-13B models at FP8 precision. In terms of inferencing performance, Intel expects a 1.3x to 1.5x improvement over the H200/H100, with up to 2.3x better power efficiency.

Intel plans to release the first Gaudi 3 products in the coming months. Air-cooled versions of the OAM accelerator are already being sampled, while liquid-cooled versions will be available for sampling soon. Additionally, Intel will be offering a PCIe version of the Gaudi 3, providing customers with a more traditional form factor. The PCIe version is set to launch in the fourth quarter of this year.

Overall, the Gaudi 3 represents Intel’s push to establish a stronger presence in the AI accelerator market. With its impressive performance and scalability, Intel aims to capture a larger share of the AI market, especially as demand for AI accelerators continues to rise.

About Our Team

Our team comprises industry insiders with extensive experience in computers, semiconductors, games, and consumer electronics. With decades of collective experience, we’re committed to delivering timely, accurate, and engaging news content to our readers.

Background Information

About Intel: Intel Corporation, a global technology leader, is for its semiconductor innovations that power computing and communication devices worldwide. As a pioneer in microprocessor technology, Intel has left an indelible mark on the evolution of computing with its processors that drive everything from PCs to data centers and beyond. With a history of advancements, Intel's relentless pursuit of innovation continues to shape the digital landscape, offering solutions that empower businesses and individuals to achieve new levels of productivity and connectivity.

About nVidia: NVIDIA has firmly established itself as a leader in the realm of client computing, continuously pushing the boundaries of innovation in graphics and AI technologies. With a deep commitment to enhancing user experiences, NVIDIA's client computing business focuses on delivering solutions that power everything from gaming and creative workloads to enterprise applications. for its GeForce graphics cards, the company has redefined high-performance gaming, setting industry standards for realistic visuals, fluid frame rates, and immersive experiences. Complementing its gaming expertise, NVIDIA's Quadro and NVIDIA RTX graphics cards cater to professionals in design, content creation, and scientific fields, enabling real-time ray tracing and AI-driven workflows that elevate productivity and creativity to unprecedented heights. By seamlessly integrating graphics, AI, and software, NVIDIA continues to shape the landscape of client computing, fostering innovation and immersive interactions in a rapidly evolving digital world.

About TSMC: TSMC, or Taiwan Semiconductor Manufacturing Company, is a semiconductor foundry based in Taiwan. Established in 1987, TSMC is a important player in the global semiconductor industry, specializing in the manufacturing of semiconductor wafers for a wide range of clients, including technology companies and chip designers. The company is known for its semiconductor fabrication processes and plays a critical role in advancing semiconductor technology worldwide.

Technology Explained

GPU: GPU stands for Graphics Processing Unit and is a specialized type of processor designed to handle graphics-intensive tasks. It is used in the computer industry to render images, videos, and 3D graphics. GPUs are used in gaming consoles, PCs, and mobile devices to provide a smooth and immersive gaming experience. They are also used in the medical field to create 3D models of organs and tissues, and in the automotive industry to create virtual prototypes of cars. GPUs are also used in the field of artificial intelligence to process large amounts of data and create complex models. GPUs are becoming increasingly important in the computer industry as they are able to process large amounts of data quickly and efficiently.

PCIe: PCIe (Peripheral Component Interconnect Express) is a high-speed serial computer expansion bus standard for connecting components such as graphics cards, sound cards, and network cards to a motherboard. It is the most widely used interface in the computer industry today, and is used in both desktop and laptop computers. PCIe is capable of providing up to 16 times the bandwidth of the older PCI standard, allowing for faster data transfer speeds and improved performance. It is also used in a variety of other applications, such as storage, networking, and communications. PCIe is an essential component of modern computing, and its applications are only expected to grow in the future.

Tensor Cores: Tensor Cores are a type of specialized hardware designed to accelerate deep learning and AI applications. They are used in the computer industry to speed up the training of deep learning models and to enable faster inference. Tensor Cores are capable of performing matrix operations at a much faster rate than traditional CPUs, allowing for faster training and inference of deep learning models. This technology is used in a variety of applications, including image recognition, natural language processing, and autonomous driving. Tensor Cores are also used in the gaming industry to improve the performance of games and to enable more realistic graphics.