After what seems like forever (actually a delay of just over six months), we’ve finally got our hands on production samples of AMD’s Barcelona quad-core CPUs. Barcelona – Quad-core Opteron, to give its production name – is the first AMD processor to use 65nm fabrication, down from the 90nm of its previous-generation processors. Variants come in both 95W and 120W power envelopes. The maximum clock speed at launch is 2GHz, but faster processors will appear. Being Opterons, the new parts are bound for servers and workstations.
CacheAside from its four “native” cores, Barcelona’s biggest architectural departure is in its cache architecture. It’s the first chip design in recent years to sport both the now-familiar level 2 cache – with 512MB per core – and an extra stage of 2MB level 3 cache.
AMD dubs this arrangement “balanced smart cache”. The balanced side of the equation come from the fact that each core’s level 2 cache is dedicated to that core and can’t be shared, whereas the level 3 is distributed across the four cores. This is in contrast to Intel’s arrangement with its Core microarchitecture CPUs, which have just one monolithic slab of level 2 cache shared between all cores.
The dedicated level 2 complement of each core is a key area that AMD claims leads to enhanced performance over the competition when it comes to multithreaded applications. With Intel’s system each core competes for cache which, AMD claims, means threads can experience “cache starvation” if one core has managed to grab all or most of the available complement.
 |
| The L3 cache sits across all four cores and is a first in modern processors. |
The flip side of the coin is that single-threaded performance can suffer when the maximum available complement of level 2 cache any core can use is 512KB, compared to Intel’s 4MB per pair of cores.
Memory bandwidthUnder the general banner of Memory Optimizer Technology, AMD claims to have increased memory bandwidth by 40% using various techniques. Chief among these is the DRAM pre-fetcher, which speculatively loads instructions from main memory before they’re required.
Intel has a similar trick, but according to an AMD spokesman, “we pre-fetch into a buffer where they [Intel] pre-fetch into level 2 cache.” By keeping pre-fetched instructions in a separate buffer, AMD claims the impact of mis-prediction and “cache pollution” is minimised, reducing the need to flush and refill the cache.
VirtualisationVirtualisation is very high on the server-side agenda, and as well as the advantage from the level 2 cache layout, AMD claims other enhancements to improve performance in this area. Its “nested paging” technique brings more of the memory management of virtual machines into hardware, and is said to reduce context-switching – the ability of the CPU to switch from execution of one virtual machine to the next – by 25%.
MemoryWith Barcelona, AMD continues its method of integrating the memory controller into the chip itself, eliminating the need for a separate MCH (memory controller hub). This route has been vindicated by Intel’s admission that it will be going for a similar approach with its Nehalem processor.
AMD, however, is sticking with DDR2 memory rather than going with the FBDIMMs that Intel’s server platforms now require. AMD claims there are “enormous power and heat penalties for memory capacity using FBDIMM”, quoting a typical power consumption figure of 83W at idle for 8 FBDIMMs, as against 14W for the same number of DDR2 DIMMs.
This feeds into its claim that its “total-platform” power consumption – the total power being sucked out of the mains socket – is lower for an AMD system than an equivalent Intel box. It gives typical figures of 186W for a Barcelona server, against 228W for a quad-core Xeon 5300 series system. Intel, however, tends to claim roughly the opposite.
Other power management enhancements include “Enhanced AMD Power Now” giving independent processor frequency control (allowing idle cores to reduce clock speed and save power), plus independent power planes for the cores and memory controller, allowing each to enter power-saving states independently.