NVIDIA’s Chief Scientist Reinforces Huang’s Law: A Paradigm Shift in Computing

NVIDIA Chief Scientist Bill Dally discussed the need for ingenuity and effort in inventing and validating fresh ingredients for each new processor to achieve a 1,000x improvement in single GPU performance on AI inference over the past decade, contrasting it with the reliance on the physics of smaller, faster chips in the past.

NVIDIA Research achieved a 1,000x improvement in single GPU performance on AI inference over the past decade.
NVIDIA Hopper architecture utilizes a dynamic mix of eight- and 16-bit floating point and integer math.
Advanced instructions developed by NVIDIA Research optimize GPU work organization, resulting in a 12.5x performance increase.

In a recent keynote address at Hot Chips, nVidia Chief Scientist Bill Dally discussed a significant shift in computer performance delivery in a post-Moore’s law era. He highlighted the need for ingenuity and effort in inventing and validating fresh ingredients for each new processor, contrasting it with the reliance on the physics of smaller, faster chips in the past.

Dally’s team at NVIDIA Research, consisting of over 300 members, achieved an astonishing 1,000x improvement in single GPU performance on AI inference over the past decade. This remarkable progress, referred to as “Huang’s Law” by IEEE Spectrum and popularized by the Wall Street Journal, was a response to the exponential growth of large language models used in generative AI.

During his talk, Dally delved into the factors that contributed to this 1,000x gain. The most significant advancement came from finding simpler ways to represent the numbers used in computer calculations, resulting in a sixteen-fold increase in performance. The latest NVIDIA Hopper architecture utilizes a dynamic mix of eight- and 16-bit floating point and integer math, specifically designed for today’s generative AI models. Dally also highlighted the energy savings achieved through this new math.

Additionally, Dally’s team made a significant leap by developing advanced instructions that optimize GPU work organization, resulting in a 12.5x performance increase. These complex commands enable computers to be as efficient as dedicated accelerators while retaining the programmability of GPUs.

The NVIDIA Ampere architecture introduced structural sparsity, a technique that simplifies AI model weights without compromising accuracy. This innovation brought another 2x performance increase and holds promise for future advancements. Dally emphasized the role of NVLink interconnects between GPUs and NVIDIA networking among systems in compounding the 1,000x gains in single GPU performance.

Despite migrating GPUs from 28 nm to 5 nm semiconductor nodes over the past decade, Dally noted that this technology only accounted for a 2.5x improvement. This marks a significant departure from the previous era of computer design under Moore’s law, which predicted a doubling of performance every two years as chips became smaller and faster. The limitations imposed by the physics of shrinking, such as heat dissipation, have curtailed these gains.

Nevertheless, Dally expressed confidence in the continuation of Huang’s law, highlighting several opportunities for future advancements. These include further simplification of number representation, increased sparsity in AI models, and the design of better memory and communication circuits. Dally concluded by stating that the new dynamic in computer design presents exciting opportunities for computer engineers at NVIDIA, allowing them to be part of a winning team, collaborate with intelligent individuals, and work on impactful designs.

About Our Team

Our team comprises industry insiders with extensive experience in computers, semiconductors, games, and consumer electronics. With decades of collective experience, we’re committed to delivering timely, accurate, and engaging news content to our readers.

Background Information

About nVidia: NVIDIA has firmly established itself as a leader in the realm of client computing, continuously pushing the boundaries of innovation in graphics and AI technologies. With a deep commitment to enhancing user experiences, NVIDIA's client computing business focuses on delivering cutting-edge solutions that power everything from gaming and creative workloads to enterprise applications. Renowned for its GeForce graphics cards, the company has redefined high-performance gaming, setting industry standards for realistic visuals, fluid frame rates, and immersive experiences. Complementing its gaming prowess, NVIDIA's Quadro and NVIDIA RTX graphics cards cater to professionals in design, content creation, and scientific fields, enabling real-time ray tracing and AI-driven workflows that elevate productivity and creativity to unprecedented heights. By seamlessly integrating graphics, AI, and software, NVIDIA continues to shape the landscape of client computing, fostering innovation and immersive interactions in a rapidly evolving digital world.

Technology Explained

GPU: GPU stands for Graphics Processing Unit and is a specialized type of processor designed to handle graphics-intensive tasks. It is used in the computer industry to render images, videos, and 3D graphics. GPUs are used in gaming consoles, PCs, and mobile devices to provide a smooth and immersive gaming experience. They are also used in the medical field to create 3D models of organs and tissues, and in the automotive industry to create virtual prototypes of cars. GPUs are also used in the field of artificial intelligence to process large amounts of data and create complex models. GPUs are becoming increasingly important in the computer industry as they are able to process large amounts of data quickly and efficiently.