The future of CPUs
What happens when multiple-core processors are no longer sufficient for our demands? Stuart Andrews looks at the future of CPUs.
Source: Copyright © PC Pro, Dennis Publishing
I think we’re actually on the verge of a revolution,’ says Phil Emma of IBM’s Thomas J Watson Research Center. ‘The bad news is that classical scaling isn’t going to work in quite the same way anymore. But the good news is that when we had 20-odd years of this scaling theory working for us, it allowed us as an industry to become complacent – now we have to look to extend things in different ways, and that’s going to call for a lot more ingenuity.’
Phil Emma has a point. There’s life in Moore’s Law yet; processor complexity still doubles every 18 to 24 months, performance continues to improve, and the shift from 90nm to 65nm technology has improved speed while reducing heat and power consumption. However, there’s a growing sense that the party is over – that progress can’t continue without real innovations in materials and manufacturing. CPU architects understand that getting high performance in tomorrow’s applications will require more than a few additional instructions and a die-shrink or dollop of cache; it will require a rethink of the way CPUs work.
You can see this process at work in the processors emerging today. Take a look at the Intel Core 2 Duo. It’s a native dual-core chip, not two processors botched together on a single die, and its shared Level 2 cache can be allocated dynamically, so that if one core is doing all the work it gets the lion’s share of the resources. It’s a design built for the way real applications work in practice, not how Intel might like them to work in theory. And in the next 12 months, we can expect quad-core workstation/server and desktop variants (codenamed Clovertown and Kentsfield) to push its architecture even further.
AMD’s next generation may arrive later, but it’s the biggest departure for the company since Athlon 64. Expected to ship in mid-2007, the K8L (as it’s widely, if unofficially, known) is a native quad-core architecture, designed to work more efficiently with larger memory spaces and new SIMD instructions, and double the SSE and floating-point resources of the current Athlon 64 line. Like Core 2 Duo, it can handle 128-bit vector operations in a single cycle.
So where are we headed from here? Well, take these trends and stretch them out much further: think multiple processors, wider execution pipelines, larger, smarter caches and beefy, out-of-order prefetch and scheduling hardware. In its own future-looking documents, Intel describes processors ‘that will have dozens and even hundreds of cores in some cases’. AMD is thinking along similar lines. K8L and its successors will be modular designs, making it easier to integrate more cores and transports as the market demands.
Expect the execution hardware to get bigger and learn to handle more specialised tasks. AMD senior fellow Chuck Moore explains: ‘If you look at a processor today – on a floor plan diagram, for example – and say, “Okay, where are the instructions actually executed?”, what you see is that the execution units aren’t that big.’ This, Moore says, will change, with future AMD processors doing more to widen out the SSE data paths so that there’s more execution density for media-processing. ‘We’re going to see more media processing capability being built right into these general-purpose processors.’
Specialised hardware will also emerge in the form of co-processors. AMD’s Torrenza initiative will open up the HyperTransport bus to allow first or
third party co-processors to access the same transport ring as the CPU and memory. While this is initially envisioned as a technology for the workstation and server market, there’s no reason why we couldn’t see physics or 3D rendering co-processors working hand-in-hand with a next-generation Athlon. And if this sort of functionality proves its worth, it could easily make the transition onto the CPU.
‘The cases where we’ll see that, of course, will be the ones where there’s real demonstrable value,’ says Moore, ‘particularly something that would help the power efficiency of the chip itself. If, rather than running tons of code to accomplish some function, we can put a special piece of hardware on board to do it much more efficiently, that helps the whole power efficiency.’
Intel is thinking along similar lines and plans to incorporate dedicated hardware for a variety of tasks, including 3D graphics rendering, digital signal processing and natural language processing.
Of course, a multicore architecture provides other options. Phil Emma believes that, because people expect processors to do diverse things, it might make sense to design some cores for specific applications. ‘In addition to multicore, those things you call cores might be different sorts of things.’ In short, cores don’t have to be general purpose. Intel will design processors that allow dynamic reconfiguration of the cores, interconnects and caches to meet diverse and changing requirements. Such changes could be made at the fab or by an OEM, but most excitingly Intel talks of reconfiguration at runtime, the CPU adapting to support applications on-the-fly.
This sort of flexibility is a hallmark of next-generation CPUs. Both Chuck Moore and Phil Emma talk of virtualisation as a key technology, enabling one physical machine to appear in various configurations – as multiple discrete systems or as a single out-of-order powerhouse machine – to match the needs of the app in hand. AMD has already experimented with technologies that make multiple cores appear as a super-powered single core at OS level.
The magic combination of multicore, specialised hardware and virtualisation will also bring new challenges. Phil Emma says, ‘We’re going to have much more pressure put on the on-chip storage and the off-chip bandwidth, which is required to bring the content into that storage.’ Future high-capacity caches might initially use 1T of DRAM, a high-speed memory technology, then evolve towards 3D memory structures with multiple planes of circuitry. Alternatively, AMD has licensed Z-RAM: a form of embedded memory that can achieve five times the memory density of the embedded SRAM used in today’s on-chip caches.
Also, the transport links will have to get faster. Technologies like AMD’s HyperTransport 3 will help, boosting FSB bandwidth from 2.8 to 5.2 GigaTransfers per second, but the real future is probably optical, using light rather than electrons to transfer data across the system bus (see boxout, overleaf).
Of course, the central issues governing CPUs in the future will be the same as now: heat and power. As processors scale down in size from 65nm to 45nm to 32nm, the power density goes up to levels where it might adversely affect performance. And while a die-shrink means an identical processor will consume less power and produce less heat, clock it up and you eventually reach a point where it becomes uneconomical to power and cool it. As a result, CPUs will have to get even smarter about their power management. Both Core 2 Duo and K8L show a partitioned view of power management, where parts of the chip can be shut off when not in use.
There’s no doubt that heat and power will eventually stretch silicon beyond its limits, and not every part of the processor can get smaller. As Ghavam Shahidi, director of Silicon Technology at IBM’s Thomas J Watson Research Center, notes, ‘As processes change from 90nm to 65nm to 45nm and beyond, everything shrinks, but some things can’t shrink any more. One of these is the gate oxide [the insulating material used to form a transistor gate], which at just 11 angstroms can’t be made any smaller than it is right now. Moving from 90nm to 65nm to 45nm, for the first time we’re not making these gate oxides smaller. And as they move forward, we may not be able to make some of the other material thicknesses as small as we’d like to.’Material changes
This makes the key to improving future performance not just about changing the architecture, but changing the very substance of the CPU. We’ve already seen this at work – AMD, Intel and IBM use variants of strained silicon technology, where the silicon layer interacts with a layer of another material (such as silicon germanium) with a wider atomic spacing. The atoms in the silicon stretch out in an attempt to align with the atoms in the secondary material, so allowing current to move through at greater speeds. IBM and AMD have also transitioned to SOI (Silicon on Insulator) processes, where the silicon layer sits on an insulating substrate, reducing the amount of charge a transistor has to move in a switching operation.
As we move on to 45nm, other materials come into play. As noted before, one limiting factor is the size of the gate oxide: how do you shrink it, while making it thick enough to prevent current leakage, without reducing its capacitance – the property that allows it to function as a gate? The answer is to replace the current silicon oxide with a ‘high-k’ material – one with a higher atomic thickness and a higher capacitance. Intel has announced plans to do this with its 45nm line in 2007. In addition, there’s a search for low-k materials – which enable current to flow faster – for use in the interconnects between transistors. Despite strong progress, it’s widely believed that, in both cases, 2007 is optimistic.
These technologies are designed to extend the life of silicon, but there is another way: the easiest approach is to replace the silicon substrate totally or partially with a similar material – say germanium or gallium arsenide (GaAs) – that offers higher electron mobility and faster switching. Intel and the UK firm Qinetic have also demonstrated Quantum Well transistors based on indium antimonide (InSb), which could theoretically produce chips that boost performance by 50 percent over silicon while cutting power consumption by a factor of ten. The attractive thing about these technologies is that they maintain the basic characteristics of silicon processes.
The alternative is more radical and involves future processors based on molecular electronics – using individual molecules to carry charge and act as switches – or spintronics, where we swap binary transistors based on current for non-binary transistors that also take the spin state of electrons into account. Future processors might even swap energy for light using optical transistors.
While Moore’s Law still applies – potentially for the next 15 years – it’s unlikely silicon will be cast aside. Rice University’s Professor James Tour, a respected proponent of molecular electronics, admits that the technology isn’t ready to compete. Instead, the way forward is ‘the modification of silicon through molecules’, where ‘you still drive the current through silicon, but modify the silicon’s properties through the attachment of molecules to the surface’. The benefit, as Tour says, is that ‘you’re not coming in and saying “silicon, step aside”. Everybody resists that. Instead, you're saying “let us enhance the life of your silicon” and they go “oh, we really need that”.’ It’s a similar story with optics; Intel might not want them to replace silicon, but it likes the idea of using it as a data transport. ‘Today, optics is a niche technology,’ said Intel’s Pat Gelsinger in 2003. ‘Tomorrow, it’s the mainstream of every chip we build.’
Beyond 2020, however, the limitations of silicon may blow the field wild open. Current optics- or moletronics-based prototypes have issues, but this may not always be the case. IBM’s Ghavam Shahidi points out, ‘In any new technology, there are always many problems, but you overcome them one at a time, and when you overcome them all, you have a product.’ It’s this sort of thinking that will make whatever replaces silicon a reality.
And so Intel - for the past few years the poor relation to AMD where desktop performance is concerned - suddenly comes along and says, ‘Hello, here’s our new desktop processor.’ I put it in a test rig and it knocks my socks off. But that’s not the end of the story.
Computing power is a wonderful thing, but it's a means not an end. It inspires the creation of new applications and makes previously impractical ones viable; it means scientists can gain an insight into complex systems and fold proteins to find cancer cures. But to leagues of pasty-faced teenagers, all it does is allow them to make endless forum posts reporting their latest benchmark figures and how they think that maybe they can get another 0.2 percent and they can’t wait until [insert processor codename of your choice here] comes out and they’ll definitely be getting one of those and it might even get 10,000 SillyMarks. What fools!
And then it came to pass that I spent almost an entire weekend benchmarking like an idiot. Tweaking settings. Defragging the disk once more just to make sure. Setting the PC Authority benchmark suite. Watching as the first few tests streaked across the screen. Trying to work out if it looked faster than it did last time. Drifting distractedly away and constantly checking my watch until the final figure popped onto the screen about four hours later (PC Authority’s benchmarks aren’t kind to a machine and take that long to arrive at an official publishable result). Then immediately feeling dissatisfied – despite the fact that the figures I was getting were higher than anything I’d ever seen – because I was sure that if I just tweaked that a bit and set it going again then another four hours later I might see an abstract number on the screen that was 0.1 higher.
I’d flung myself into the bottomless abyss of the benchmark obsessed, a self-defeating universe of exponentially diminishing returns and ever-higher dissatisfaction, which is also populated by hi-fi nuts and people who like to customise their cars. I have to admit it was fun for a while.
Now that I’ve emerged, blinking, into the light of the normal, I’ve remembered I don’t have a need for that kind of CPU power. My main requirement of a PC is that it be fast but quiet.
The easy route to a quiet computer isn’t to spend a million dollars on outlandish cooling kit – that’s treating the symptom, not the disease.
The disease is power consumption, and if you reduce that you reduce heat output, and if you reduce heat output you can slow all your fans down, and if you can slow down all your fans your PC won’t make as much noise.
That’s the real blessing of Core 2. Intel gets my thanks for waking up to the fact that raw speed is only half the problem. Increasing computing performance for the mass market is now only a valid endeavour if you also reduce power consumption.
To be fair, I’m sure Intel’s engineers have been more than aware of this for at least five years, but the marketing machine had to keep pumping out nonsense about Pentium 4 with almost as much gusto as Pentium 4s themselves pumped out heat.
Not only has the company produced a processor that consumes less power than my old Pentium 4 while being far faster, it comes in the same physical package, so my Zalman high-efficiency CPU cooler can be pressed into service on the new chip. It’s designed for CPUs that chuck out 100W; with a 2.16GHz E6400 in place consuming well under half that, its fan doesn’t even need to spin. The power supply is under less strain so it kicks out less heat, reducing internal case temperature. Accordingly, if the case’s temperature is low, its cooling fans don’t need to spin as fast, if at all. Suddenly, in one fell satisfying reverse avalanche, my desktop PC is quiet as a sleeping mouse.
This isn’t all down to Intel, though. Credit also needs to go to those on the periphery of the industry who’ve been banging the power-consumption drum for ages. The chip company VIA is one of the major champions of this, and
I feel a bit of a Judas for planning to replace the C3 processor-equipped guts of my living-room media PC with a Core-based AOpen Flex ATX board. The performance of the notebook Core T2300 I used while testing this board outstrips the 3GHz Pentium 4 in my old desktop machine, and the only cooling it needs is a teeny-tiny and inaudible 40mm fan.
So I know Intel didn’t start this ball rolling. It hardly even noticed the ball was moving at all until it rolled over its foot and made it yelp. But, well, sorry VIA, I’m abandoning you for now. I know it’s wrong, but it feels so right.
At the beginning of the decade, molecular electronics, or moletronics, was seen as a technology that could destroy silicon. As far back as 1975, IBM’s Avi Aviram and the chemist Mark Ratner proposed that individual molecules could mimic the behaviour of simple electronic devices. By 2000, nanotechnology specialists like Rice University’s James Tour and Yale University’s Mark Reed – along with researchers at HP and UCLA – were talking of revolutionary future processors based on this theory. Molecular devices wouldn’t be fabricated like their silicon equivalents but grown, enabling processors containing vast numbers of transistors to be produced at practically zero cost. And with millions of molecular transistors squeezed into the space occupied by a single silicon transistor, they could store massive quantities of data and process it at ludicrous speeds.
In practice, progress has been relatively slow. 'The projections that were made six or seven years ago never crystallised as fast as people would like,’ says Tour. 'Driving current through a molecule turns out to be difficult to do reliably and consistently. There are a few noted exceptions, where people have been able to succeed in doing this reliably, but by and large the number of switching cycles is still limited.'
However, there’s new hope for moletronics. Tour points to the carbon nanotube; a cylindrical carbon molecule 50,000 times thinner than a human hair and possessing ideal physical, thermal and electrical characteristics for use as a semiconductor. In March, IBM researchers built a complete electronic circuit from a single carbon nanotube, and by May they’d designed a technique to pin them in place – a kind of principle of manufacture. Molecular chemists like carbon nanotubes for two reasons: their molecular tract conducts charge with minimal resistance, while preventing electrons leaking from the channel. Add the two together, and you potentially have smaller transistors, a faster flow of electrons and therefore increased performance.
It’s early days. Carbon nanotubes are difficult to grow in a singular form with a consistent opening, and equally difficult to manipulate. However, while nanotube transistors are a way off, nanotube-based interconnects might not be. They could certainly transfer information at the high bandwidths required by future processors.
Intel’s working on all-silicon optical interconnects, and silicon laser is a major step. Another way forward is to swap electric current as a data transport for something faster. And what could be faster than light? Fibre-optic communications already use modulators to encode data as a laser wavelength, then multiplex several such wavelengths onto a single glass fibre, before demultiplexing and decoding at the other end. There’s no reason why this process couldn’t be replicated on a microscopic scale, first as a high-speed interconnect between conventional silicon devices, and with the aid of optical transistors as a technology for computation.
Intel’s Silicon Photonics team is following the first aim, and over the past five years has started putting building blocks in place. The first was a high-speed silicon modulator. Before 2004, nobody had built one that could run at speeds higher than 20MHz, but in February that year Intel announced a 1GHz silicon modulator, demonstrating in 2005 that this was capable of transmitting data at 10Gb/s – faster than the current copper interconnects used in a Pentium 4-based system. In the future, it’s believed that a combination of Intel’s recently developed low-cost silicon lasers and inlaid optical pathways could transmit data at 40Gb/s across the system.
Using optics as a computing technology poses more difficulties. Researchers at Stanford University and MIT built prototype optical transistors in 2003, while Israeli startup Lenslet developed an optical Digital Signal Processor (DSP) capable of eight tera-operations per second (or compressing 15 channels of H.264 HDTV simultaneously). However, there’s a huge gap between a specialised, limited function DSP and a general-purpose optical CPU, so optics is likely to complement silicon for a long time before it might replace it.
Replacing the transistor
The transistor is the building block of modern electronics, but it’s a building block that will prove harder to produce as it shrinks to sizes of less than a few nanometres. The solution of researchers at HP’s Quantum Research Lab in Palo Alto is a radical one: to replace the transistor with a new kind of switch known as a 'crossbar latch'.
A crossbar latch consists of a single microscopic nanowire crossed by two control lines, with an electrically switchable junction where they intersect. Employing a series of electrical impulses on the control lines and using switches arranged in opposite polarities, the latch can mimic the behaviour of a NOT gate – one of the three basic gates that comprise the primary logic of a circuit. The latch could also restore the circuit to its ideal voltage. As crossbar latches can be formed in complex arrays at minuscule sizes, with data stored or a logic function performed at each junction of the wires, they’re an ideal building block for a new era of molecular computing.