Intel Xeon 5500 series: 14 CPUs tested and compared
The Nehalem-based Xeons are a landmark release, with exceptional performance married to great power.
After much speculation and anticipation, Intel’s new 5500 Series Xeon processors have finally arrived - and judging by the number of new features and improvements on offer, the wait looks well worthwhile.
Codenamed Nehalem, Intel’s new micro-architecture represents the next step in its “tick-tock” design model, where the last phase was to introduce the Core architecture and then shrink it to 45nm.
Nehalem delivers the next “tock” phase, but it’s far more significant than that since it represents a sea change in processor design. Instead of presenting a one-size-fits-all solution, Nehalem effectively offers a set of building blocks.
This allows it to scale not only across different hardware, but also across different market sectors including mobile, desktop and server DP and MP applications.
We’ve already covered Intel’s Core i7 implementation (web ID: 131494), and now we look closer at the 5500 series “Gainestown” Xeon DP server processors.
The new processors still use the 45nm manufacturing process, but the four cores are on a single die and share one chunk of L3 cache, which can be up to 8MB.
They use the same L1 cache as the existing Core technology, but have a lower latency 256KB L2 cache per core. The processors use a larger LGA1366 socket, and the main reason for the increase in size is the processors now incorporate an integrated memory controller and a link controller for Intel’s new QPI (Quick Path Interconnect).
You can wave goodbye to the trusty old FSB, which was used to create a backbone between the processors and the chipset’s memory controllers and I/O bus. It worked well enough, but it created a bottleneck since all system memory was in a single, shared location.
The QPI facilitates high-speed connections between processor and I/O controller. Each processor has its own dedicated memory, which it accesses via its integrated controller. It uses the QPI for fast access to the I/O controller, and the 5500 series Xeons also have a QPI link between dual-processor sockets.
The QPI operates at speeds of up to 25.6GB/sec or 6.4GT/sec (GigaTransfers/data transfers/operations per second).
It effectively allows all system memory to be no more than one hop away from any processor, and as QPI is point-to-point there’s none of the single bus contention in the FSB. If one processor needs to access the memory on the other processor, it can do so via QPI links.
Intel’s Turbo Boost technology uses the fact that most apps don’t scale to an arbitrary number of CPUs. A controller on the processor monitors the load and can power down a core if it isn’t being used. This creates thermal/power headroom, which can be passed on to the other cores and used to boost their speed.
The more cores that are powered down, the more boost is available to the active cores and frequencies are raised in bins, or 133MHz increments. For example, a 2.8GHz X5560 Xeon could be boosted to 3GHz with three or four active cores and up to 3.2GHz with one or two cores active.
As FSB leaves the building Intel now reacquaints us with Hyper-Threading, which allowed older Pentium and Xeon processors to simultaneously run two threads per core.
For Nehalem, this hasn’t changed in principle, but the larger cache and higher bandwidth means that – dependent on the application – it can realise a significant performance improvement for fully multithreaded apps.
Virtual and memory enhancements
Support for server virtualisation is end-to-end, as you have Intel’s VT-x and VPID (virtual processor ID) in the processors, and these have been augmented with improvements to the EPTs (extended page tables).
Combined together these are capable of increasing VM performance and reducing latency for VM transitions.
Implemented in the new Tylersburg chipset, VT-d (directed I/O) supports DMA and device generated interrupt protection and remapping. The VT-c (connectivity) suite is aimed at end devices such as compliant 10GbE network adapters and includes VMDq (Virtual Machines Device Queue), which optimises networking performance for VMs.
More changes are evident in the memory department as the 5500 series supports only DDR3 memory, with speeds currently topping out at 1333MHz. Performance improvements can be had by installing them in packs of three per DIMM bank for triple-channel access, but you don’t have to.
You can choose between RDIMMS (registered) or UDIMMs (unregistered), but FBDIMMs aren’t supported. RDIMMs support three DIMMs per channel, whereas the cheaper UDIMMs support only two per channel but are more cost-effective where you’re not planning on using more than 4GB in a server.
Choosing a 5500 Xeon
When picking your new processors choose carefully, as there are significant differences across the 5500 family.
It starts with four entry-level models with the dual-core E5502 having a 4MB L3 cache, supporting 800MHz memory speeds, a QPI speed of 4.8GT/sec and not implementing Turbo Boost or Hyper-Threading.
The other three are quad-core, but have the same specifications.
Starting with the 2.26GHz L5520 and E5520, the next group of four processors supports memory speeds up to 1066MHz, has a QPI speed of 5.86GT/sec, and implements Turbo Boost and Hyper-Threading.
The X5500 models offer all available features, along with a QPI speed of 6.4GT/sec and supported memory speeds up to 1333MHz.
At the very top of the range sits the W5580, which has a 3.2GHz frequency, 130W power needs and a price to match. The other models to consider are the low-power versions, beginning with “L”.
The embedded L5508 consumes just 38W at most, but includes a mere two cores. The more mainstream choice is the L5520, which we tested in the Fujitsu Primergy.
See the full table of processors for an at-a-glance guide to pricing and specification.
-continued next page -
To give a guide to each version of the new Xeons, we benchmarked them in FlamMap, POV-Ray and CineBench. Note that although the graphs here name the processors, they were in different rigs and thus used different amounts of memory. This information is shown at the foot of each graph.
FlamMap is a fire behaviour mapping and analysis program that computes potential fire behaviour.
It demonstrates the benefits of multicore technology and is very compute intensive, with a heavy number of floating-point operations.
The 200MHz advantage of the X5560 over the X5550 results in a speed increase of almost 10%, and it’s a good 80% quicker than the E5506 (although this was in a test rig with 8GB of 800MHz DDR3 RAM to the 24GB of 1066MHz DDR3 RAM available to the X5560 and X5560).
CineBench is a dedicated benchmark based on the 3D software Cinema 4D. It performs CPU-intensive rendering operations using either a single CPU or multiple CPUs - the results here are based on multiple CPUs.
The scores are all relative, with the basic message being that bigger is better, and as with FlamMap involves a huge amount of computation including many floating-point operations.
Once again, the X5560 came out on top, but it should be noted that even the E5506’s score is highly creditable. Likewise, the low-power L5520 performs admirably here.
POV-Ray is a free software tool for creating 3D graphics. We tested using version 3.7. It highlights the benefits of multicore technology and again uses lots of floating-point operations and compute-intensive tasks.
Here, the X5560’s clock-speed advantage over the X5550 isn’t so great, and both the 2.26GHz L5520 and 2.4GHz E5530 are within touching distance. Only the E5506, with its notably smaller 4MB L3 cache, is far behind.
Intel also quotes its own benchmark scores, in tandem with key partners. For example, it claims Fujitsu’s Primergy set records in SPECint_rate_base2006 with a score of 240.
On the TPC-C benchmark, it claims the HP ProLiant DL370 G6 “shattered” the previous record with a score of 631,766 tmpC using the Oracle 11g database.
And, using VMmark, Intel claims a number of Xeon 5500 series-based platforms “shattered the previous record by as much as as 150%”, citing a Dell PowerEdge R710 scoring 23.55@16 tiles.
With the new 5500 series of Xeon processors, Intel doesn’t appear to have missed any opportunities to improve performance. Reduced power consumption is key and tests run in our Xeon 5500-equipped servers show significant reductions in consumption across the board.
This is the biggest architectural change we’ve seen for some time from Intel.
It will continue through the next few years, with developments such as support for 16GB DIMMs in the Tylersburg-EP platform expected in 2010. Nehalem undoubtedly represents the biggest challenge AMD has ever faced in the server processor market.