Intel introduces Impressive AI Inference Capabilities, Showcasing Powerful Performance


September 11, 2023

Intel's submission of MLPerf Inference v3.1 results highlights its competitive performance in AI inference, offering customers flexibility and choice when selecting an optimal AI solution based on their specific performance, efficiency, and cost targets.

  • Intel's submission builds upon previous updates from MLCommons and Hugging Face, which demonstrated that the Gaudi 2 accelerators can outperform Nvidia's H100 on a state-of-the-art vision language model.
  • Intel's AI products offer customers flexibility and choice when selecting an optimal AI solution based on their specific performance, efficiency, and cost targets.
  • Intel remains the only vendor to provide public CPU results using industry-standard deep learning ecosystem software.


MLCommons, a leading organization in machine learning, has released the results of its MLPerf Inference v3.1 performance benchmark, which includes evaluations of GPT-J, a large language model with 6 billion parameters, as well as computer vision and natural language processing models. Intel, a major player in the AI industry, submitted its results for various hardware components, including the Habana Gaudi 2 accelerators, 4th Gen Intel Xeon Scalable processors, and Intel Xeon CPU Max Series.

The results highlight Intel’s competitive performance in AI inference and reaffirm the company’s dedication to making artificial intelligence more accessible across a wide range of AI workloads. Sandra Rivera, Intel’s executive vice president and general manager of the Data Center and AI Group, emphasized the company’s commitment to meeting customers’ needs for high-performance and efficient deep learning inference and training.

Intel’s submission builds upon previous updates from MLCommons and Hugging Face, which demonstrated that the Gaudi 2 accelerators can outperform nVidia’s H100 on a state-of-the-art vision language model. These latest results further solidify Intel as a viable alternative to Nvidia’s H100 and A100 for AI compute requirements. Intel’s AI products offer customers flexibility and choice when selecting an optimal AI solution based on their specific performance, efficiency, and cost targets, while also providing an escape from closed ecosystems.

The Habana Gaudi 2 inference performance results for GPT-J showcase its competitive performance. The Gaudi 2 achieved impressive performance metrics, with 78.58 queries per second for server queries and 84.08 queries per second for offline samples. While Nvidia’s H100 holds a slight advantage over Gaudi 2 in terms of performance (1.09x for server queries and 1.28x for offline samples), Gaudi 2 outperforms Nvidia’s A100 by a significant margin (2.4x for server queries and 2x for offline samples). The Gaudi 2 submission utilized FP8 and achieved 99.9% accuracy on this new data type. Intel plans to continue delivering performance improvements and expanded model coverage through software updates released every six to eight weeks.

Intel also submitted results for all seven inference benchmarks, including GPT-J, on its 4th Gen Intel Xeon Scalable processors. These results demonstrate excellent performance across various general-purpose AI workloads, such as vision, language processing, speech and audio translation models, as well as larger models like DLRM v2 recommendation and ChatGPT-J. Additionally, Intel remains the only vendor to provide public CPU results using industry-standard deep learning ecosystem software.

The 4th Gen Intel Xeon Scalable processor is an ideal choice for building and deploying general-purpose AI workloads with popular AI frameworks and libraries. In the GPT-J 100-word summarization task, which involves summarizing a news article of approximately 1,000 to 1,500 words, the 4th Gen Intel Xeon processors achieved impressive results, summarizing two paragraphs per second in offline mode and one paragraph per second in real-time server mode.

For the first time, Intel submitted MLPerf results for its Intel Xeon CPU Max Series, which offers up to 64 gigabytes of high-bandwidth memory. The Intel Xeon CPU Max Series was the only CPU capable of achieving 99.9% accuracy in the GPT-J benchmark, making it crucial for applications that require the highest level of accuracy.

Intel collaborated with its original equipment manufacturer (OEM) customers to showcase the scalability and wide availability of general-purpose servers powered by Intel Xeon processors. These collaborations further demonstrate Intel’s commitment to delivering AI performance that meets customer service level agreements (SLAs).

MLPerf is widely recognized as the most reputable benchmark for AI performance, providing a fair and repeatable platform for performance comparisons. Intel plans to submit new AI training performance results for the upcoming MLPerf benchmark, reaffirming its commitment to supporting customers and addressing every aspect of the AI continuum, from low-cost AI processors to high-performing hardware accelerators and GPUs for network, cloud, and enterprise customers.

Intel introduces Impressive AI Inference Capabilities, Showcasing Powerful Performance

Intel introduces Impressive AI Inference Capabilities, Showcasing Powerful Performance

(Source)

Background Information


About Intel: Intel Corporation, a global technology leader, is for its semiconductor innovations that power computing and communication devices worldwide. As a pioneer in microprocessor technology, Intel has left an indelible mark on the evolution of computing with its processors that drive everything from PCs to data centers and beyond. With a history of advancements, Intel's relentless pursuit of innovation continues to shape the digital landscape, offering solutions that empower businesses and individuals to achieve new levels of productivity and connectivity.

Intel website  Intel LinkedIn

About nVidia: NVIDIA has firmly established itself as a leader in the realm of client computing, continuously pushing the boundaries of innovation in graphics and AI technologies. With a deep commitment to enhancing user experiences, NVIDIA's client computing business focuses on delivering solutions that power everything from gaming and creative workloads to enterprise applications. for its GeForce graphics cards, the company has redefined high-performance gaming, setting industry standards for realistic visuals, fluid frame rates, and immersive experiences. Complementing its gaming expertise, NVIDIA's Quadro and NVIDIA RTX graphics cards cater to professionals in design, content creation, and scientific fields, enabling real-time ray tracing and AI-driven workflows that elevate productivity and creativity to unprecedented heights. By seamlessly integrating graphics, AI, and software, NVIDIA continues to shape the landscape of client computing, fostering innovation and immersive interactions in a rapidly evolving digital world.

nVidia website  nVidia LinkedIn

Technology Explained


CPU: The Central Processing Unit (CPU) is the brain of a computer, responsible for executing instructions and performing calculations. It is the most important component of a computer system, as it is responsible for controlling all other components. CPUs are used in a wide range of applications, from desktop computers to mobile devices, gaming consoles, and even supercomputers. CPUs are used to process data, execute instructions, and control the flow of information within a computer system. They are also used to control the input and output of data, as well as to store and retrieve data from memory. CPUs are essential for the functioning of any computer system, and their applications in the computer industry are vast.


Xeon: The Intel Xeon processor is a powerful and reliable processor used in many computer systems. It is a multi-core processor that is designed to handle multiple tasks simultaneously. It is used in servers, workstations, and high-end desktop computers. It is also used in many embedded systems, such as routers and switches. The Xeon processor is known for its high performance and scalability, making it a popular choice for many computer applications. It is also used in many cloud computing applications, as it is capable of handling large amounts of data and providing high levels of performance. The Xeon processor is also used in many scientific and engineering applications, as it is capable of handling complex calculations and simulations.





Leave a Reply