NVIDIA Blackwell sets new records in MLPerf Inference V5.0, showcasing the power and innovation of their AI infrastructure and collaboration with partners.
- NVIDIA Blackwell sets impressive records in MLPerf Inference V5.0 benchmarks
- Blackwell platform delivers 30x higher throughput compared to previous generation GPUs
- NVIDIA Hopper architecture continues to improve performance and remains relevant for evolving models and usage scenarios
NVIDIA Blackwell Breaks Records in MLPerf Inference V5.0
In the latest round of MLPerf Inference V5.0 benchmarks, NVIDIA Blackwell has made waves by setting impressive records. This marks nVidia’s first official submission using the GB200 NVL72 system, a powerful rack-scale solution crafted for AI reasoning. But what does this really mean? Essentially, it’s a game-changer in the world of AI, where the infrastructure isn’t just about storing and processing data—it’s about creating intelligence at scale. We’re talking about AI factories that transform raw data into real-time insights, all while aiming to deliver accurate answers quickly and affordably to countless users.
But let’s be real, pulling this off is no small feat. As AI models balloon to billions and trillions of parameters, the computational demands skyrocket. This means that generating each token becomes more resource-intensive, which can drive up costs and limit throughput. So, how do we keep the wheels turning smoothly? It requires relentless innovation across every layer of the tech stack—from silicon to network systems to software.
What’s New in MLPerf Inference?
The latest updates to MLPerf Inference include the introduction of Llama 3.1 405B, one of the largest and most complex open-weight models to date. Additionally, the new Llama 2 70B Interactive benchmark introduces stricter Latency requirements compared to its predecessor, making it a better reflection of the real-world constraints that come with delivering top-notch user experiences.
Alongside the Blackwell platform, the NVIDIA Hopper platform also showcased remarkable performance improvements, particularly on the Llama 2 70B benchmark. Thanks to full-stack optimizations, we’re seeing significant gains over the past year.
NVIDIA Blackwell’s Impressive Performance
The GB200 NVL72 system, which connects 72 NVIDIA Blackwell GPUs to function as a single colossal GPU, delivered a staggering 30x higher throughput on the Llama 3.1 405B benchmark compared to the NVIDIA H200 NVL8 submission. How did they achieve this? By tripling the performance per GPU and expanding the NVIDIA NVLink interconnect domain by a whopping 9x.
While many companies test their hardware against MLPerf benchmarks, only NVIDIA and its partners have submitted results for the Llama 3.1 405B benchmark, setting a high standard in the industry.
When it comes to production inference deployments, latency is a critical factor. This involves two key metrics: Time to First Token (TTFT)—the time it takes for a user to see the first response from a large language model—and Time Per Output Token (TPOT)—how quickly those responses are delivered. The new Llama 2 70B Interactive benchmark has a 5x shorter TPOT and a 4.4x lower TTFT, creating a more responsive user experience. NVIDIA’s submission using an NVIDIA DGX B200 system with eight Blackwell GPUs tripled performance over the previous generation H200 GPUs, setting a high bar for this more demanding benchmark.
The Rising Value of NVIDIA Hopper AI Factories
NVIDIA’s Hopper architecture, introduced in 2022, is powering a significant number of today’s AI inference factories and continues to be a backbone for model training. Through ongoing software optimizations, NVIDIA is boosting the throughput of Hopper-based AI factories, which translates into greater value for users.
On the Llama 2 70B benchmark, which first appeared in MLPerf Inference v4.0, throughput for the H100 GPU has increased by 1.5x. The newer H200 GPU, which boasts larger and faster memory, takes this up another notch with a 1.6x increase. Notably, Hopper has successfully tackled every benchmark, including the newly added Llama 3.1 405B, Llama 2 70B Interactive, and graph neural network tests. This versatility ensures that Hopper remains relevant as models and usage scenarios evolve and become more challenging.
A Collaborative Ecosystem
This round of MLPerf saw 15 partners delivering stellar results on the NVIDIA platform, including big names like ASUS, Cisco, Google Cloud, and VMware. The variety of submissions highlights the extensive reach of NVIDIA’s platform, which is available across all cloud service providers and server makers globally.
MLCommons is doing crucial work to continually evolve the MLPerf Inference benchmark suite, ensuring it keeps pace with the latest AI advancements. This effort provides IT decision-makers with rigorous, peer-reviewed performance data, helping them choose the best AI infrastructure for their needs.
In a world where AI is rapidly evolving, having the right tools and benchmarks is essential. And with NVIDIA leading the charge, we’re only just beginning to scratch the surface of what’s possible.

About Our Team
Our team comprises industry insiders with extensive experience in computers, semiconductors, games, and consumer electronics. With decades of collective experience, we’re committed to delivering timely, accurate, and engaging news content to our readers.
Background Information
About ASUS:
ASUS, founded in 1989 by Ted Hsu, M.T. Liao, Wayne Hsieh, and T.H. Tung, has become a multinational tech giant known for its diverse hardware products. Spanning laptops, motherboards, graphics cards, and more, ASUS has gained recognition for its innovation and commitment to high-performance computing solutions. The company has a significant presence in gaming technology, producing popular products that cater to enthusiasts and professionals alike. With a focus on delivering and reliable technology, ASUS maintains its position as a important player in the industry.Latest Articles about ASUS
About Google:
Google, founded by Larry Page and Sergey Brin in 1998, is a multinational technology company known for its internet-related services and products. Initially for its search engine, Google has since expanded into various domains including online advertising, cloud computing, software development, and hardware devices. With its innovative approach, Google has introduced influential products such as Google Search, Android OS, Google Maps, and Google Drive. The company's commitment to research and development has led to advancements in artificial intelligence and machine learning.Latest Articles about Google
About nVidia:
NVIDIA has firmly established itself as a leader in the realm of client computing, continuously pushing the boundaries of innovation in graphics and AI technologies. With a deep commitment to enhancing user experiences, NVIDIA's client computing business focuses on delivering solutions that power everything from gaming and creative workloads to enterprise applications. for its GeForce graphics cards, the company has redefined high-performance gaming, setting industry standards for realistic visuals, fluid frame rates, and immersive experiences. Complementing its gaming expertise, NVIDIA's Quadro and NVIDIA RTX graphics cards cater to professionals in design, content creation, and scientific fields, enabling real-time ray tracing and AI-driven workflows that elevate productivity and creativity to unprecedented heights. By seamlessly integrating graphics, AI, and software, NVIDIA continues to shape the landscape of client computing, fostering innovation and immersive interactions in a rapidly evolving digital world.Latest Articles about nVidia
Technology Explained
Blackwell: Blackwell is an AI computing architecture designed to supercharge tasks like training large language models. These powerful GPUs boast features like a next-gen Transformer Engine and support for lower-precision calculations, enabling them to handle complex AI workloads significantly faster and more efficiently than before. While aimed at data centers, the innovations within Blackwell are expected to influence consumer graphics cards as well
Latest Articles about Blackwell
GPU: GPU stands for Graphics Processing Unit and is a specialized type of processor designed to handle graphics-intensive tasks. It is used in the computer industry to render images, videos, and 3D graphics. GPUs are used in gaming consoles, PCs, and mobile devices to provide a smooth and immersive gaming experience. They are also used in the medical field to create 3D models of organs and tissues, and in the automotive industry to create virtual prototypes of cars. GPUs are also used in the field of artificial intelligence to process large amounts of data and create complex models. GPUs are becoming increasingly important in the computer industry as they are able to process large amounts of data quickly and efficiently.
Latest Articles about GPU
Latency: Technology latency is the time it takes for a computer system to respond to a request. It is an important factor in the performance of computer systems, as it affects the speed and efficiency of data processing. In the computer industry, latency is a major factor in the performance of computer networks, storage systems, and other computer systems. Low latency is essential for applications that require fast response times, such as online gaming, streaming media, and real-time data processing. High latency can cause delays in data processing, resulting in slow response times and poor performance. To reduce latency, computer systems use various techniques such as caching, load balancing, and parallel processing. By reducing latency, computer systems can provide faster response times and improved performance.
Latest Articles about Latency
VMware: VMware is an industry leader in virtualization technology, allowing for the effective virtualization of computer hardware, networks, and operating systems. Any organization needing to quickly and efficiently deploy complex network environments can utilize VMware, allowing for greater flexibility than physical solutions. VMware products are especially popular in the enterprise segment, as they offer cost savings through increased worker productivity and improved resource utilization. Virtualized services hosted on VMware offer better scaleability and reliability, as well as improved fault-tolerance and cost savings associated with server consolidation. VMware also provides support for the latest hardware and software components, ensuring compatibility with a wide range of applications and services. Furthermore, VMware products provide a secure computing environment, as virtual machines are isolated from each other, preventing the spread of viruses and other threats from one virtual machine to another.
Latest Articles about VMware
Trending Posts
ASUS Set to Showcase ProArt Displays and Innovative PC Solutions at NAB 2025
Supermicro Launches NVIDIA HGX B200 Systems, Showcasing AI Performance in the Market
Intel to discontinue PC and smartphone interconnection application
Cat Quest II now available for free on Epic Games Store: A purrfect gaming adventure!
Raspberry Pi introduces Compact 45W USB-C Power Supply for Diverse Projects
Evergreen Posts
NZXT about to launch the H6 Flow RGB, a HYTE Y60’ish Mid tower case
Intel’s CPU Roadmap: 15th Gen Arrow Lake Arriving Q4 2024, Panther Lake and Nova Lake Follow
HYTE teases the “HYTE Y70 Touch” case with large touch screen
NVIDIA’s Data-Center Roadmap Reveals GB200 and GX200 GPUs for 2024-2025
Intel introduces Impressive 15th Gen Core i7-15700K and Core i9-15900K: Release Date Imminent