Intel’s P-Core Architectures Compared: Skylake, Sunny Cove, Golden Cove, Redwood Cove, and Lion Cove.


June 8, 2024 by our News Team

Intel's P-core architectures, including Skylake, Sunny Cove, Golden Cove, Redwood Cove, and the upcoming Lion Cove, have evolved over the years to deliver improved performance and efficiency, with each iteration bringing enhancements to various aspects of CPU design.

  • Significant improvements in branch prediction and instruction processing
  • Enhanced front-end and back-end capabilities for efficient execution
  • Ongoing commitment to innovation and future advancements


Intel’s P-Core Architectures Compared: Skylake, Sunny Cove, Golden Cove, Redwood Cove, and Lion Cove

Intel has been at the forefront of processor technology for many years, consistently delivering powerful and efficient CPUs for a variety of devices. Over the past decade, Intel has introduced several performance (P) cores, each with its own unique architecture and capabilities. In this article, we will compare and analyze five of Intel’s P-core architectures: Skylake, Sunny Cove, Golden Cove, Redwood Cove, and Lion Cove.

Skylake: The Workhorse

Skylake is Intel’s most widely used core architecture, powering the 6th, 7th, 8th, 9th, and 10th generation processors. It features a range of improvements over its predecessors, including an enhanced branch prediction unit (BPU) and larger branch target buffers (BTBs). The BPU predicts the flow of instructions and determines whether a branch will be taken or not. Skylake’s BTBs allow for extended program tracking, improving accuracy and overall performance.

The front-end of Skylake includes a two-step branch prediction process and a 32KB L1 instruction cache. The micro-op cache stores frequently used micro-ops, reducing power consumption and improving efficiency. Skylake also features a 4-way decoder and a 50-entry instruction queue, ensuring smooth instruction processing.

In the back-end, Skylake boasts a 224-entry reorder buffer (ROB), which ensures that instructions are executed in their initial order. It also includes six load/store ports and four execution ports, providing ample resources for efficient execution. The ALUs are 256-bit wide, and the L1 data cache is 32KB.

Sunny Cove: A Comprehensive Upgrade

Sunny Cove was Intel’s first 10nm core architecture and brought significant improvements to the table. It featured an expanded branch target buffer, micro-op cache, and instruction queue, allowing for better prediction and instruction processing. The reorder buffer was also increased to 352 entries, providing more flexibility in out-of-order execution.

The front-end of Sunny Cove saw improvements in the BTBs, micro-op cache, and dispatch bandwidth. The back-end featured an expanded reorder buffer, increased branch order buffer, and more registers for both integer and floating-point operations. The ALU scheduler and load/store queues were also enhanced.

Golden Cove: Widening the Front-End

Golden Cove, introduced with Alder Lake-S, focused on widening the front-end and improving the branch predictor. It featured a larger instruction TLB, expanded BTBs, and increased fetch bandwidth. The decoder was widened to 6-way, and the micro-op cache saw a significant expansion.

In the back-end, Golden Cove introduced a fifth execution port and expanded the reorder buffer. The FP register file was increased, and the ALU scheduler received additional entries. The cache and memory subsystem also saw improvements with a revised data cache hierarchy.

Redwood Cove: A Tick with Minimal Changes

Redwood Cove is a slight modification of the Golden Cove architecture, leveraging the Intel 4 process node. It represents a “tick” in Intel’s product cycle, focusing on node shrink with minimal changes to the microarchitecture. Redwood Cove features an increased I-cache and micro-op queue, lower instruction execution Latency, and improved branch prediction and prefetch capabilities.

Lion Cove: The Future of Intel’s P-Core Architecture

Lion Cove is Intel’s upcoming P-core architecture set to power Arrow Lake and Lunar Lake processors. It will leverage TSMC’s 3nm process node and the Intel 20A process node. While specific details are not yet available, Lion Cove promises several architectural upgrades.

The front-end of Lion Cove will feature a larger branch prediction unit, wider fetch and decode capabilities, and an expanded uop cache. The back-end will see improvements in the rename/dispatch buffer, retire throughput, and execution ports. The cache and memory subsystem will also be enhanced, with a deeper TLB and revised data cache hierarchy.

Conclusion

Intel’s P-core architectures have evolved significantly over the years, delivering improved performance and efficiency with each iteration. From Skylake to Sunny Cove, Golden Cove, Redwood Cove, and the upcoming Lion Cove, Intel continues to push the boundaries of processor technology. These architectures bring enhancements to various aspects of CPU design, including branch prediction, instruction processing, register renaming, execution ports, and cache hierarchy.

As Intel looks towards the future with Lion Cove, it is clear that the company is committed to delivering even more powerful and efficient processors. With advancements in process technology and architectural improvements, Intel’s P-core architectures will continue to drive innovation and shape the future of computing.

About Our Team

Our team comprises industry insiders with extensive experience in computers, semiconductors, games, and consumer electronics. With decades of collective experience, we’re committed to delivering timely, accurate, and engaging news content to our readers.

Background Information


About Intel: Intel Corporation, a global technology leader, is for its semiconductor innovations that power computing and communication devices worldwide. As a pioneer in microprocessor technology, Intel has left an indelible mark on the evolution of computing with its processors that drive everything from PCs to data centers and beyond. With a history of advancements, Intel's relentless pursuit of innovation continues to shape the digital landscape, offering solutions that empower businesses and individuals to achieve new levels of productivity and connectivity.

Intel website  Intel LinkedIn

About TSMC: TSMC, or Taiwan Semiconductor Manufacturing Company, is a semiconductor foundry based in Taiwan. Established in 1987, TSMC is a important player in the global semiconductor industry, specializing in the manufacturing of semiconductor wafers for a wide range of clients, including technology companies and chip designers. The company is known for its semiconductor fabrication processes and plays a critical role in advancing semiconductor technology worldwide.

TSMC website  TSMC LinkedIn

Technology Explained


CPU: The Central Processing Unit (CPU) is the brain of a computer, responsible for executing instructions and performing calculations. It is the most important component of a computer system, as it is responsible for controlling all other components. CPUs are used in a wide range of applications, from desktop computers to mobile devices, gaming consoles, and even supercomputers. CPUs are used to process data, execute instructions, and control the flow of information within a computer system. They are also used to control the input and output of data, as well as to store and retrieve data from memory. CPUs are essential for the functioning of any computer system, and their applications in the computer industry are vast.


Latency: Technology latency is the time it takes for a computer system to respond to a request. It is an important factor in the performance of computer systems, as it affects the speed and efficiency of data processing. In the computer industry, latency is a major factor in the performance of computer networks, storage systems, and other computer systems. Low latency is essential for applications that require fast response times, such as online gaming, streaming media, and real-time data processing. High latency can cause delays in data processing, resulting in slow response times and poor performance. To reduce latency, computer systems use various techniques such as caching, load balancing, and parallel processing. By reducing latency, computer systems can provide faster response times and improved performance.





Leave a Reply