The latest iteration of the Pentium 4, code named Prescott, is an interesting development for a few reasons. First off, it represents Intel's entry to the world of 90nm (90 nanometre, or 0.09 micron) fabrication for processors. For the uninitiated, all this means is the size of individual components on the processor core have been shrunk to even smaller levels. It's a natural part of processor evolution to shrink the size of components, as it allows more transistors to be packed into a smaller space, reduces heat and resistance, and this all leads to faster CPUs.
The Prescott is also interesting because it's the first time Intel has launched a new advanced architecture and not made it the company's flagship product. While the Prescott incorporates several architectural advancements, it's still the Xeon-based Pentium 4 Extreme Edition that holds the Intel performance crown.
Finally, the Prescott is a bit of an anomaly in terms of processor performance ramping due to its new architecture, and in that respect it resembles the original Pentium 4 (Willamette).
While much of the press covering the Prescott focuses on the increase in cache size, it tends to ignore another fact. The biggest difference between the Prescott and its predecessor, the Northwood, is the increase in the length of the pipeline from 20 stages to 31 stages.
The pipeline is much like a construction line where each stage represents a small amount of the overall work that needs to be done. In processors, a short pipeline needs only a few individual steps, but each one needs to do lots of work. As a pipeline is only as fast as its slowest step, there's a good reason to try to simplify as many steps as possible. Intel's strategy, which was pioneered with the NetBurst architecture of the Pentium 4 in the Williamette, was to have a large number of small steps, each of which was simple enough that it could be ramped up in speed (ie: frequency).
One problem with a long pipeline is it's possible to have an operation that's halfway through only to discover it's no longer needed, or have an operation waiting on the outcome of another. These cases increase with a longer pipeline and have to be balanced up against other features that improve speed. One of the things that can reduce the inefficiencies of a long pipeline is better branch prediction. As a result the Prescott incorporates several advancements to its branch prediction that improve it by a significant amount. Unfortunately this alone is not enough.
This is where the cache comes in. If the pipeline does need to be flushed and reloaded, it helps to be able to stuff it full of instructions and data as fast as possible. In processor terms your main memory is like a draught horse, while your L2 cache is like a Porsche, and your L1 cache is like Michael Schumacher when he's not just cruising. As such, larger and faster caches can help compensate for a longer pipeline.
This is why the Prescott has 1MB L2 cache, which is double that of the Northwood, and has also doubled the L1 data cache to 16KB.
The only other significant changes in the Prescott are the addition of 13 new instructions that fall under the moniker SSE3. Unlike the previous iterations of SSE, these are not all SIMD (simultaneous instruction multiple data), but add extra functionality to things like floating point, 3D gaming and other mathematically intensive operations. The advantages of SSE3 will not be seen for some time though, as software has to be compiled to use the instructions.
So what does all this mean for us? Well, we strapped a 3.2GHz Prescott to a testbench and ran it head to head against its predecessor, a 3.2GHz Northwood, to see what difference the architecture made when the frequencies were the same.
As you can see from the graphs, the 2D performance and 3DMark2001 SE show the Prescott has a very marginal (around 1 percent) edge. Interestingly, though, in Quake 3: Arena at low resolution (to remove the graphics card from the performance equation) showed the Northwood with a 4 percent lead. This is very likely a result of the longer pipeline where even the cache and improved branch prediction can't quite keep up with the very floating point intensive operations in Quake.
As for pricing, the Prescott is pitched at the same level as the Northwood, so given the 2D performance is quite similar, you're not really gaining or losing much. 3D performance is not much worse either, but if you're a gamer you might still want to go for a Northwood.
So at the moment, there's not terribly much to recommend the current stock of Prescott CPUs. However, the changes to architecture should allow Intel to ramp the frequency up, and there are rumours it'll hit 4GHz this year, and perhaps even 5GHz next year. It's then that the Prescott will really shine.
One last issue, though, is heat. Even though the Prescott is on a 90nm process, it generates significantly more heat than the Northwood. This means it's not going to be a top option for overclockers, and Intel may yet have some hurdles to leap before it can increase speeds. We'll be sure to put the new models to the test as they're released though.