Supermicro Launches NVIDIA HGX B200 Systems, Showcasing AI Performance in the Market

Supermicro's latest announcement of leading MLPerf Inference v5.0 benchmarks with their innovative 8-GPU systems has solidified their position as a leader in the AI industry, showcasing impressive performance gains and a comprehensive portfolio of GPU-optimized systems.

Supermicro achieved impressive results in MLPerf Inference v5.0 benchmarks, outperforming competitors by a significant margin.
Their unique building block architecture and collaboration with NVIDIA have allowed them to optimize their systems for peak performance.
Supermicro offers a comprehensive AI portfolio with over 100 GPU-optimized systems, including both air-cooled and liquid-cooled options.

Supermicro Sets the Bar High with MLPerf Inference v5.0

Super Micro Computer, Inc. (SMCI) is stirring up the tech world with its latest announcement: they’ve achieved some serious bragging rights by leading the pack in MLPerf Inference v5.0 benchmarks. With their innovative 8-GPU systems, both the 4U liquid-cooled and 10U air-cooled setups have shown off their expertise, outperforming competitors by a significant margin. Imagine generating over three times the tokens per second for Llama2-70B and Llama3.1-405B benchmarks compared to H200 8-GPU systems. That’s not just a small victory; it’s a game-changer.

Charles Liang, the president and CEO of Supermicro, couldn’t be more thrilled. He stated, “Supermicro remains a leader in the AI industry, as evidenced by the first new benchmarks released by MLCommons in 2025.” With their unique building block architecture, Supermicro is not just first to market; they’re also delivering a diverse range of systems optimized for various workloads. Their collaboration with nVidia seems to be paying off, allowing them to fine-tune their systems for peak performance.

Performance You Can Count On

What’s particularly impressive is that Supermicro is the only vendor publishing record MLPerf inference performance across both air-cooled and liquid-cooled NVIDIA HGX B200 8-GPU systems. And here’s the kicker: both types of systems were operational even before the MLCommons benchmark start date. Supermicro engineers have been hard at work optimizing both the systems and the software, ensuring that they showcase their impressive capabilities.

In fact, the air-cooled B200 system performed on par with its liquid-cooled counterpart, demonstrating that you don’t always need fancy cooling solutions to achieve top-notch results. Supermicro has been shipping these systems to customers while simultaneously conducting benchmarks, which speaks volumes about their commitment to quality and reliability.

MLCommons has been clear about the importance of reproducibility in these results. They ensure that the products are readily available and that the benchmarks can be audited by their members. Supermicro has adhered to these guidelines, optimizing their systems within the rules set by MLCommons.

Impressive Benchmark Results

The SYS-421GE-NBRT-LCC and SYS-A21GE-NBRT models have shown remarkable performance across various benchmarks. For instance, when running the Mixtral 8x7B Inference and Mixture of Experts benchmarks, these systems achieved an astonishing 129,000 tokens per second. That’s a level of performance that leaves previous generations in the dust.

For larger models, the Supermicro air-cooled and liquid-cooled NVIDIA B200 systems delivered over 1,000 tokens per second for the Llama3.1-405b model. In contrast, older GPU systems simply can’t keep up. And when it comes to smaller inference tasks, the Supermicro systems shine again, with the LLAMA2-70b benchmark showing the highest performance among Tier 1 suppliers.

Let’s break down some of those jaw-dropping numbers:

–

Stable Diffusion XL (Server)

: SYS-A21GE-NBRT (8x B200-SXM-180 GB) – #1 queries/s, 28.92
–

Llama2-70b-interactive-99 (Server)

: SYS-A21GE-NBRT (8x B200-SXM-180 GB) – #1 Tokens/s, 62,265.70
–

Llama3.1-405b (offline)

: SYS-421GE-NBRT-LCC (8x B200-SXM-180 GB) – #1 Tokens/s, 1521.74
–

Llama3.1-405b (Server)

: SYS-A21GE-NBRT (8x B200-SXNM-180 GB) – #1 Tokens/s, 1080.31 (for an 8-GPU node)
–

Mixtral-8x7b (Server)

: SYS-421GE-NBRT-LCC (8x B200-SXM-180 GB) – #1 Tokens/s, 129,047.00

David Kanter, Head of MLPerf at MLCommons, congratulated Supermicro on their impressive submission, stating, “We are pleased to see their results showcasing significant performance gains compared to earlier generations of systems.” It’s clear that customers are going to be thrilled with these performance improvements, validated by the rigorous MLPerf results.

A Comprehensive AI Portfolio

Supermicro isn’t just about a few standout systems; they offer a comprehensive AI portfolio that includes over 100 GPU-optimized systems. Whether you prefer air-cooled or liquid-cooled options, there’s something for everyone. From single-socket optimized systems to robust 8-way multiprocessor configurations, Supermicro has it all.

Their rack-scale systems integrate computing, storage, and networking components, which means less hassle during installation once they arrive at your site.

What’s more, Supermicro’s NVIDIA HGX B200 8-GPU systems are equipped with next-gen cooling technology. With newly developed cold plates and a powerful 250kW coolant distribution unit (CDU), they’ve more than doubled cooling capacity while maintaining the same 4U form factor. This design innovation allows for an impressive setup—think eight systems with 64 NVIDIA Blackwell GPUs in just a 42U rack.

And if you’re wondering about the air-cooled options, the new 10U NVIDIA HGX B200 system has been redesigned to accommodate eight 1000 W TDP Blackwell GPUs. You can fit up to four of these systems in a rack, achieving the same density as previous generations while delivering up to 15 times the inference and three times the training performance.

In the fast-paced world of AI and machine learning, Supermicro is clearly a player to watch. With their technology and commitment to performance, they are setting new standards and redefining what’s possible. So, what’s next for Supermicro? Only time will tell, but one thing’s for sure: they’re not slowing down anytime soon.

About Our Team

Our team comprises industry insiders with extensive experience in computers, semiconductors, games, and consumer electronics. With decades of collective experience, we’re committed to delivering timely, accurate, and engaging news content to our readers.

Background Information

About nVidia:

NVIDIA has firmly established itself as a leader in the realm of client computing, continuously pushing the boundaries of innovation in graphics and AI technologies. With a deep commitment to enhancing user experiences, NVIDIA's client computing business focuses on delivering solutions that power everything from gaming and creative workloads to enterprise applications. for its GeForce graphics cards, the company has redefined high-performance gaming, setting industry standards for realistic visuals, fluid frame rates, and immersive experiences. Complementing its gaming expertise, NVIDIA's Quadro and NVIDIA RTX graphics cards cater to professionals in design, content creation, and scientific fields, enabling real-time ray tracing and AI-driven workflows that elevate productivity and creativity to unprecedented heights. By seamlessly integrating graphics, AI, and software, NVIDIA continues to shape the landscape of client computing, fostering innovation and immersive interactions in a rapidly evolving digital world.

Latest Articles about nVidia

About Supermicro:

Supermicro is a reputable American technology company founded in 1993 and headquartered in San Jose, California. Specializing in high-performance server and storage solutions, Supermicro has become a trusted name in the data center industry. The company offers a wide range of innovative and customizable server hardware, including motherboards, servers, storage systems, and networking equipment, catering to the needs of enterprise clients, cloud service providers, and businesses seeking reliable infrastructure solutions.

Latest Articles about Supermicro

Technology Explained

Blackwell: Blackwell is an AI computing architecture designed to supercharge tasks like training large language models. These powerful GPUs boast features like a next-gen Transformer Engine and support for lower-precision calculations, enabling them to handle complex AI workloads significantly faster and more efficiently than before. While aimed at data centers, the innovations within Blackwell are expected to influence consumer graphics cards as well

Latest Articles about Blackwell

GPU: GPU stands for Graphics Processing Unit and is a specialized type of processor designed to handle graphics-intensive tasks. It is used in the computer industry to render images, videos, and 3D graphics. GPUs are used in gaming consoles, PCs, and mobile devices to provide a smooth and immersive gaming experience. They are also used in the medical field to create 3D models of organs and tissues, and in the automotive industry to create virtual prototypes of cars. GPUs are also used in the field of artificial intelligence to process large amounts of data and create complex models. GPUs are becoming increasingly important in the computer industry as they are able to process large amounts of data quickly and efficiently.

Latest Articles about GPU

Stable Diffusion: Stable Diffusion is a technology that is used to improve the performance of computer systems. It is a process of spreading out the load of a system across multiple processors or cores. This helps to reduce the amount of time it takes for a system to complete a task, as well as reduce the amount of energy used. Stable Diffusion is used in many areas of the computer industry, such as in cloud computing, distributed computing, and high-performance computing. It is also used in gaming, where it can help to reduce the amount of time it takes for a game to load. Stable Diffusion is also used in artificial intelligence, where it can help to improve the accuracy of machine learning algorithms.

Latest Articles about Stable Diffusion

Evergreen Posts

NZXT about to launch the H6 Flow RGB, a HYTE Y60’ish Mid tower case

Intel’s CPU Roadmap: 15th Gen Arrow Lake Arriving Q4 2024, Panther Lake and Nova Lake Follow

HYTE teases the “HYTE Y70 Touch” case with large touch screen

NVIDIA’s Data-Center Roadmap Reveals GB200 and GX200 GPUs for 2024-2025

Intel introduces Impressive 15th Gen Core i7-15700K and Core i9-15900K: Release Date Imminent