Cerebras Systems Reveals Lightning-Fast AI Chip Boasting 4 Trillion Transistors and 900,000 Cores


March 14, 2024 by our News Team

Cerebras Systems introduces the Wafer Scale Engine 3 (WSE-3) chip, doubling the performance of its predecessor while maintaining the same power draw and price, designed for training large-scale AI models and powering the Cerebras CS-3 AI supercomputer with 125 petaflops of peak AI performance.

  • The WSE-3 doubles the performance of its predecessor while maintaining the same power draw and price, making it a cost-effective option for training large-scale AI models.
  • The CS-3 boasts impressive specifications, including 4 trillion transistors, 900,000 AI cores, and 125 petaflops of peak AI performance, making it one of the most powerful AI supercomputers on the market.
  • The CS-3's vast memory system, capable of storing up to 1.2 petabytes of data, allows for the training of frontier models that are 10 times larger than GPT-4 and Gemini, making it a game-changer in the field of generative AI.


Cerebras Systems, known for its advancements in generative AI acceleration, has once again raised the bar with the introduction of the Wafer Scale Engine 3 (WSE-3). This new chip doubles the performance of its predecessor, the Cerebras WSE-2, while maintaining the same power draw and price. Designed specifically for training large-scale AI models, the WSE-3 is built on a 5 nm process with a staggering 4 trillion transistors. It powers the Cerebras CS-3 AI supercomputer, which boasts an impressive 125 Petaflops of peak AI performance through its 900,000 AI optimized compute cores.

The CS-3’s key specifications include 4 trillion transistors, 900,000 AI cores, and 125 petaflops of peak AI performance. It also features 44 GB of on-chip SRAM and utilizes the 5 nm TSMC process. In terms of external memory, users can choose between 1.5 TB, 12 TB, or a massive 1.2 PB. The CS-3 is capable of training AI models with up to 24 trillion parameters and can be clustered with up to 2048 systems.

One of the most impressive aspects of the CS-3 is its vast memory system, which can store up to 1.2 petabytes of data. This allows for the training of frontier models that are 10 times larger than GPT-4 and Gemini. With a single logical memory space, partitioning or refactoring becomes unnecessary, simplifying the training workflow and boosting developer productivity. In fact, training a one-trillion parameter model on the CS-3 is as straightforward as training a one billion parameter model on GPUs.

The CS-3 caters to both enterprise and hyperscale needs. Even in a compact four system configuration, it can fine-tune 70B models in a day. Scaling up to 2048 systems, the CS-3 can train Llama 70B from scratch in just a single day, an unprecedented achievement in generative AI.

Cerebras also offers the latest Cerebras Software Framework, which provides native support for PyTorch 2.0 and the latest AI models and techniques such as multi-modal models, vision transformers, mixture of experts, and diffusion. The CS-3 remains the only platform that offers native hardware acceleration for dynamic and unstructured sparsity, resulting in training speed improvements of up to 8 times.

Andrew Feldman, CEO and co-founder of Cerebras, expressed his pride in introducing the third-generation wafer-scale AI chip. He emphasized that the WSE-3 is purpose-built for AI work, from mixture of experts to 24 trillion parameter models. Feldman is excited to bring the WSE-3 and CS-3 to market, as they address some of the most significant challenges in AI today.

The CS-3 stands out not only for its performance but also for its superior power efficiency and software simplicity. It delivers more compute performance in a smaller space and with less power consumption compared to other systems. While GPUs see their power consumption double with each generation, the CS-3 manages to double its performance while staying within the same power envelope. Furthermore, the CS-3 offers exceptional ease of use, requiring 97% less code than GPUs for large language models (LLMs) and enabling training across a wide range of parameters from 1 billion to 24 trillion in purely data parallel mode. In fact, implementing a GPT-3 sized model on Cerebras only requires 565 lines of code, setting an industry record.

Cerebras has already garnered significant interest in the CS-3, with a substantial backlog of orders from various sectors including enterprise, government, and international clouds. Partnerships with organizations like Argonne National Laboratory and Mayo Clinic further highlight the potential of Cerebras’ wafer-scale engineering in pushing the boundaries of AI and science.

Additionally, Cerebras has formed a strategic partnership with G42, resulting in the deployment of AI supercomputers like Condor Galaxy 1 (CG-1) and Condor Galaxy 2 (CG-2), which together deliver 8 exaFLOPs of AI performance. The construction of Condor Galaxy 3, which will consist of 64 CS-3 systems and produce 8 exaFLOPs of AI compute, is already underway. This third installation in the Condor Galaxy network is set to contribute to the delivery of tens of exaFLOPs of AI compute. The Condor Galaxy network has successfully trained leading open-source models such as Jais-30B, Med42, Crystal-Coder-7B, and BTLM-3B-8K.

Overall, Cerebras’ latest advancements in wafer-scale engineering with the WSE-3 and CS-3 demonstrate their commitment to pushing the boundaries of AI performance and simplifying the training process. With industry partnerships and customer momentum backing their innovations, Cerebras is poised to make a significant impact on the AI landscape.

For more information about the CS-3, please visit Cerebras’ official website.

Cerebras Systems Reveals Lightning-Fast AI Chip Boasting 4 Trillion Transistors and 900,000 Cores

About Our Team

Our team comprises industry insiders with extensive experience in computers, semiconductors, games, and consumer electronics. With decades of collective experience, we’re committed to delivering timely, accurate, and engaging news content to our readers.

Background Information


About TSMC:

TSMC, or Taiwan Semiconductor Manufacturing Company, is a semiconductor foundry based in Taiwan. Established in 1987, TSMC is a important player in the global semiconductor industry, specializing in the manufacturing of semiconductor wafers for a wide range of clients, including technology companies and chip designers. The company is known for its semiconductor fabrication processes and plays a critical role in advancing semiconductor technology worldwide.

TSMC website  TSMC LinkedIn
Latest Articles about TSMC

Technology Explained


Petaflops: Petaflops is a measure of computing speed, specifically one quadrillion floating-point operations per second. This technology is used to measure the performance of supercomputers, which are extremely powerful computers used for complex calculations and simulations. Petaflops technology has revolutionized the computer industry by allowing for faster and more efficient processing of large amounts of data. This has enabled advancements in fields such as weather forecasting, climate modeling, and drug discovery. Petaflops technology has also been utilized in artificial intelligence and machine learning, allowing for more accurate and sophisticated algorithms. In simpler terms, Petaflops is like a race car for computers, allowing them to process information at lightning-fast speeds and tackle complex problems that were previously impossible to solve.

Latest Articles about Petaflops




Leave a Reply