AMD Barcelona
David Fearon
|
Nov 20, 2007 10:14 AM
We test AMD’s first quad-core processor for both speed and power efficiency
After what seems like forever (actually a delay of just over six months), we’ve finally got our hands on production samples of AMD’s Barcelona quad-core CPUs. Barcelona – Quad-core Opteron, to give its production name – is the first AMD processor to use 65nm fabrication, down from the 90nm of its previous-generation processors. Variants come in both 95W and 120W power envelopes. The maximum clock speed at launch is 2GHz, but faster processors will appear. Being Opterons, the new parts are bound for servers and workstations.
Cache
Aside from its four “native” cores, Barcelona’s biggest architectural departure is in its cache architecture. It’s the first chip design in recent years to sport both the now-familiar level 2 cache – with 512MB per core – and an extra stage of 2MB level 3 cache.
AMD dubs this arrangement “balanced smart cache”. The balanced side of the equation come from the fact that each core’s level 2 cache is dedicated to that core and can’t be shared, whereas the level 3 is distributed across the four cores. This is in contrast to Intel’s arrangement with its Core microarchitecture CPUs, which have just one monolithic slab of level 2 cache shared between all cores.
The dedicated level 2 complement of each core is a key area that AMD claims leads to enhanced performance over the competition when it comes to multithreaded applications. With Intel’s system each core competes for cache which, AMD claims, means threads can experience “cache starvation” if one core has managed to grab all or most of the available complement.
 |
| The L3 cache sits across all four cores and is a first in modern processors. |
The flip side of the coin is that single-threaded performance can suffer when the maximum available complement of level 2 cache any core can use is 512KB, compared to Intel’s 4MB per pair of cores.
Memory bandwidth
Under the general banner of Memory Optimizer Technology, AMD claims to have increased memory bandwidth by 40% using various techniques. Chief among these is the DRAM pre-fetcher, which speculatively loads instructions from main memory before they’re required.
Intel has a similar trick, but according to an AMD spokesman, “we pre-fetch into a buffer where they [Intel] pre-fetch into level 2 cache.” By keeping pre-fetched instructions in a separate buffer, AMD claims the impact of mis-prediction and “cache pollution” is minimised, reducing the need to flush and refill the cache.
Virtualisation
Virtualisation is very high on the server-side agenda, and as well as the advantage from the level 2 cache layout, AMD claims other enhancements to improve performance in this area. Its “nested paging” technique brings more of the memory management of virtual machines into hardware, and is said to reduce context-switching – the ability of the CPU to switch from execution of one virtual machine to the next – by 25%.
Memory
With Barcelona, AMD continues its method of integrating the memory controller into the chip itself, eliminating the need for a separate MCH (memory controller hub). This route has been vindicated by Intel’s admission that it will be going for a similar approach with its Nehalem processor.
AMD, however, is sticking with DDR2 memory rather than going with the FBDIMMs that Intel’s server platforms now require. AMD claims there are “enormous power and heat penalties for memory capacity using FBDIMM”, quoting a typical power consumption figure of 83W at idle for 8 FBDIMMs, as against 14W for the same number of DDR2 DIMMs.
This feeds into its claim that its “total-platform” power consumption – the total power being sucked out of the mains socket – is lower for an AMD system than an equivalent Intel box. It gives typical figures of 186W for a Barcelona server, against 228W for a quad-core Xeon 5300 series system. Intel, however, tends to claim roughly the opposite.
Other power management enhancements include “Enhanced AMD Power Now” giving independent processor frequency control (allowing idle cores to reduce clock speed and save power), plus independent power planes for the cores and memory controller, allowing each to enter power-saving states independently.
Putting AMD’s constant claims of investment protection to the test, we were keen to upgrade an Opteron NEC WA2510 workstation from its twin dual-core Opterons up to a pair of our test 1.8GHz 2346HE quad-core Barcelona parts, making for an eight-core workstation. AMD has made much of the fact that the new processors protect investment by being physically and thermally compatible with older second-generation Socket F CPUs, claiming on numerous occasions that existing systems would be upgradable to the new parts with nothing more than a BIOS update.
 |
| The test results below were achieved by directly changing combinations of old and new Opterons in the same system. |
Compatible or not?
That may be true in the majority of cases, but certainly not all of them: the Tyan S2927 board in our three-month-old workstation doesn’t support the dual-power planes of Barcelona. The message here is not to make any assumptions and only consider switching out the CPUs in existing systems if you have a specific compatibility assurance for the motherboard model you own. So, unable to use the WA2510, Boston supplied us with a Barcelona-compatible rack-server system for testing (see opposite).
In terms of performance, Barcelona delivers no great surprises. To compare it core-for-core with the previous generation, we installed two dual-core Opteron 2214 CPUs in the Boston server, benchmarked and power-tested the system, and then replaced the two dual-core chips with just one of our test quad-core Barcelona parts. From the results, it seems that the new L3 cache isn’t doing Barcelona’s architecture many favours; the single quad-core Barcelona system came out slower in every test than the twin dual-core setup, including our multiple applications and multithreaded 3D tests. The clock-speed differential (1.8GHz against 2.2GHz), is a big factor in the old Opterons’ favour, of course, but we’d have hoped the new architecture would have closed the gap to a greater extent.
Power efficiency
When it comes to performance per Watt, the story continues to be interesting and far from clear cut. The 95W TDP (thermal design power) of the dual-core 2214 looks very hefty in comparison to the 55W of the 2346HE, and when you look at TDP per core it’s even clearer: 47.5W per core for the old part, compared to a mere 13.75W per core for the new. Monitoring power consumption during our 3ds Max test render – an extremely efficient user of all available cores – we were able to derive a performance-per-Watt factor from average power consumption in relation to the time taken to render the first frame.
Although the power consumption of the system fitted with a single Barcelona is lower than the twin dual-core setup, the raw performance deficit means that the old setup is more efficient, with a performance-per-Watt factor of 4.38 to the Barcelona’s 2.54. In other words, even though the system with a single Barcelona fitted uses less power, the longer time taken to complete the test means it used more energy overall.
The situation does change when it comes to a more realistic scenario, though. We directly replaced the two dual-core Opterons with two quad-core Barcelonas, taking the Boston system from its previous maximum of four cores up to a new complement of eight. In this configuration, the Barcelona Opterons thrash the old-generation parts in highly threaded code, for instance giving a 33-second time to first frame in our render test in comparison to 48 seconds with the old processors. And, true to AMD’s promise, the total platform power consumption didn’t rise, with our Supermicro test rig consuming around 210W during testing in both configurations. The good news on the power front continues, since the idle power of the new parts is lower than the old: the Boston system consumed 157W at idle populated with a pair of 2214s, but only around 138W with a pair of 2346HEs. That’s effectively four extra cores for free as far as power is concerned.
So the initial message for Barcelona is mixed: as far as power consumption goes, processor-for-processor replacement will certainly increase efficiency and performance-per-Watt. We’re a little underwhelmed by its absolute performance levels, though, and that’s before we’ve pitted it against comparably priced Intel Xeon parts. In our next round of benchmarks, we’ll pit the two against each other and also test their virtualisation credentials: an area that both sides claim is the ideal scenario for their server parts.