Codenamed Penryn, the latest revision of Intel’s Core 2 processor line is the first CPU to be based on the most advanced fabrication ever developed: the 45nm process. This is down 30% from the 65nm process Core 2 has been based on since its launch, giving further reductions in power consumption and transistor density. Intel has chosen to stick with the dual, dual-core design though, with two dies in the same physical package, rather than the ‘native’ quad-core approach that AMD has pioneered with its Barcelona designs
Penryn is the “tock” in Intel’s much-vaunted new “tick-tock” design strategy. The idea is that every year or so, it begins production of an existing architecture on a new fabrication process (the tick) and then the following year brings in a brand-new micro-architecture based on the new fabrication process (the tock). This allows Intel to squeeze more life from existing designs and to perfect the new fabbing process before adding the extra complication of a new architecture. Consequently, the basic architecture of the new parts remains the same as that of the current Core 2 generation, but with some design tweaks and enhancements.
Performance-wise, the main boost is given by a fairly prosaic addition: an increase in L2 cache from the 8MB in current quad-core parts – 4MB for each pair of cores – up to a new total of 12MB. This is shared in the same way, with 6MB per pair of cores. The cheaper Core 2 Quad and dual-core Core 2 Duo chips based on the new architecture will feature the same boost: 6MB L2 cache for the dual-core parts or 12MB for the quad-core parts. The extra cache is mostly responsible for the increased transistor count of the QX9650, with a total of 820 million transistors on its two dies and 410 million on the new dual-core parts.
Our test CPU was the top-end quad-core Core 2 Extreme QX9650, running at 3GHz. The clock speed has caused surprise in some quarters: it was widely believed that the new Extreme part would debut at 3.33GHz. Bus speed and preferred chipset are identical to those of the previous top-end part, the Core 2 Extreme QX6850, with a 1333MHz front-side bus running on either a P35 or X38 chipset-based board. That allowed us to undertake a direct, clock-for-clock comparison of performance with the old part at the same speed.
From our results based on the clock-for-clock comparison, it’s clear that Intel is right to be moving to a new micro-architecture in the next year or so, as the basic performance increase offered by Penryn is marginal. Against the 3GHz QX6850, percentage performance improvement is never close to double figures. The extra cache and some instruction-pipeline tweaks give the QX9650 only around a 6% overall boost against the QX6850 in our most CPU-bound test, the 3ds Max render, with a time-to-first-frame of 30 seconds against 32 seconds for the older part. Overall, our application benchmarks moved from 2.19 to 2.28. Definitely faster, but only by 4% overall.
Where the new part does score over the old is in power consumption. The less thirsty 45nm transistors contribute to a stonking 50W lower power drain under full load, with our test rig consuming a maximum of 210W with the QX6850 against just 160W with the QX9650 fitted. There’s a similarly impressive reduction in idle power drain: the QX9650 used 97W compared with 116W of the old part.
As well as extra cache and refinements to the instruction pipeline, the new parts also introduce SSE4 instruction extensions. However, as the number of instructions encompassed by SSE becomes larger, they also get more specialised, and Intel itself has trouble coming up with concrete examples of where they’ll be useful beyond video motion-prediction (used in encoding) and specialised maths that could be useful in 3D rendering applications (but not games, where 3D calculations are performed by the GPU).
So, it’s a tentative first outing for 45nm – but it can’t yet be said that Intel has slipped up. In theory, the new process should allow clock speeds to be pushed higher than ever, while keeping power consumption within acceptable limits. We overclocked our test processor to 3.33GHz simply by increasing the clock multiplier from 9x to 10x, and with no further tweaking it ran perfectly stably.
The message from these first results is that if you have a current-generation quad-core CPU you won’t gain much performance by upgrading, although you’ll almost certainly get a more overclockable part. The first generation of Penryn is by no means a disaster, but big performance gains will only come once the higher-clocked parts that 45nm will allow are released.