AMD’s new Llano APUs are at first glance a very un-atomic products. They don’t really overclock, only compete with the low end of Intel’s CPU lineup and there’s currently only one model of processor on the market. But that doesn’t mean they aren’t one of the most important pieces of technology to hit this year, or that you should discount them.
Llano is part of AMD’s new strategy of launching to the mainstream first and then moving on to the high end. We saw it late last year with the way that the 6000 series RADEON cards were launched, and earlier this year with the Atom-smashing E and C lineup of APUs. Llano, now branded the A series, launched first as a notebook processor, and has now appeared in desktop form. With a pricetag of around $150 for the top end A8-3850 processor (this was the only model available in retail at the time of writing) it is placed firmly in Core i3 territory – but it brings some very unique features to the table, which in turn makes it a phenomenal mid-range gaming solution.
To understand the strengths of the A series we need to spend some time looking at how it has been constructed at a Silicon level. AMD has taken a 600MHz, 400 core six series RADEON GPU, complete with UVD3 video decoding, and paired it with four 2.9GHz CPU cores (based on tweaked up versions of the Phenom core) and an integrated Northbridge and memory controller. All of this sits on a single 32nm High-K Metal Gate (HKMG) silicon die, made by AMD’s spunoff semiconductor manufacturing company, Globalfoundries.
This isn’t just a case of slapping two products together and calling it an APU. AMD has been putting a load of work into getting everything working in harmony, which largely revolves around solving memory bandwidth and access issues.
As we know modern GPUs are memory hungry beasts. They are used to feeding high bandwidth, high capacity ultrafast GDDR. This alone creates a massive problem when you move to the relatively narrow bus and slow memory that is DDR3. Add to this four CPU cores, which traditionally don’t like sharing RAM, and you have a tricky design challenge.
The simple act of incorporating the GPU onto the same silicon as the CPU and memory controller is a major boost to memory bandwidth, reducing both latency and power drain. What AMD has done with the A series is ensure that the bandwidth between the memory controller and the GPU is the same 29.8GB/s as the bandwidth between the memory controller and the RAM. It has also added paths for the GPU to directly interact with memory. This enables the smoothest possible operation, and when combined with new OpenCL commands designed to minimise copying to and from memory when using both CPU and GPU AMD is well and truly on top of this crucial issue (although we would love to see what quad channel memory could do to APU performance).
This isn’t to say that combining CPU and GPU just causes a series of potential problems; it has also allowed AMD to completely redefine its Turbo technology (first seen with the Phenom II X6). The first iteration of this was fairly crude, cranking up the speed of three cores when all six weren’t in use. It didn’t have a huge level of granularity and while useful wasn’t as slick as it could have been. With the A series AMD has built the entire product with power management in mind (one of the reasons that the A series beats Intel’s CPUs in mobile battery life tests), which has enabled it to implement this new kind of Turbo.
Rather than measure temperature directly, the APU monitors power draw across the components, and uses those results to keep itself within TDP. This means that it can dynamically control both the CPU and GPU cores. If the GPU is idle or has a low load for example, the APU can crank up CPU core frequency beyond the norm. Or if you are running a game that stresses the entire GPU but only needs one or two CPU cores the APU can use the overhead to crank up GPU speed. Interestingly in the latter case AMD gives the GPU priority over the CPU to ensure consistent graphical performance. It can also push the chip beyond TDP for some loads, at which point temperature monitoring kicks in to ensure that the chip throttles before getting too hot.