Earlier, when discussing the possible integration of CPUs and GPUs on the system level, at least for high end systems, before the chip-level "sharing the die 'til we die" CPU-GPU on-die marriage happens, we looked into the most efficient ways of executing that integration.
Obviously, PCI Express, even in its v2 and upcoming v3 versions, has too much latency and protocol baggage to most efficiently link CPUs and GPUs for working together on common tasks. Merely seamless coherent memory access is usually a big problem.
Now, AMD as a company does have one huge - albeit very temporary - advantage here right now: all the pieces of the puzzle are in place for quite a while. Namely, a reasonably fast (but could be faster, please) CPU core in those 6-core Istanbul Opterons and their coming desktop equivalents, complemented by a very fast HyperTransport 3 low latency, high speed - up to 25.6GBps per link for version 3.1 - interconnect protocol between CPUs and I/O, as well as performance-leading GPUs in the ATI Radeon R800 family. Both the CPU core and HyperTransport are stable, proven old-timers, with stuff like FPGA accelerators and ultrafast supercomputing cluster network links already being hosted on the Opteron HyperTransport combo for years. There's even the HTX slot spec for I/O cards hitting directly into system memory.
As mentioned before, Nvidia lacks both the CPU and interconnect portions here. If it had an Alpha license that would have solved both issues, and as a bonus, propelled it to the top of the CPU performance pack fairly quickly, assuming it had a reliable CPU fab partner. A hint - IBM Microelectronics and Global Foundries did share technology that was once used by IBM Micro to fab the last batch of Alphas some 8 years ago.
Now, if AMD, for instance, released a HyperTransport flavour of the R800 family chip that adds an MMU or at least a very fast DMA (Direct Memory Access) mechanism to access the main memory and CPU resources directly via HTX, and maybe had another bunch of HyperTransport links to add other GPUs the same way, it could - on top of the existing fast local GDDR5 memory - have far faster, nearly seamless access to the rest of the system resources and not just system memory. Even wrapping it as a kind of co-processor to the CPU could work, in the same fashion as the old 80287 was to the 80286 some 25 years ago.
With the HTX scalability and a sufficient number of links, each CPU could drive a bunch of GPUs, scalable only by the total system bandwidth required. Remember, with each new GPU here, you add to the total interconnect bandwidth and total memory bandwidth, as you can see on the graph. Due to less contention between them, not only CrossFire gaming framerates would scale better, but so would OpenGL engineering apps as well as, of course, the computational codes.
All this looks beautiful, but you may ask, why temporary? Well, Intel did work hard to use the base of the former Alpha EV7 / EV8 interconnect to create its Quick Path Interconnect. While it is still far from HyperTransport's level of maturity, and even without any slot spec for card expansion, QPI has Intel and its market share behind it.
Then, Intel will - one day soon, hopefully - have Larrabee out there, up and running. If the original early Larrabee focus on workstation and computational graphics is kept, then so much so Intel has more drive to enable even the initial Larrabees to talk to their CPU brethren directly via QPI, not just over the slower PCI. On top of that, since Larrabee has an X86 ISA front-end anyway, it'll be that much easier to treat it as a co-processor right away, and offload the code portions that it can run on its own.
I think that Intel will have all these running, one way or another, in about a year's time. Imagine the same graph as above, but with Intel Xeons and Larrabees instead. Whether Intel decides to go with the QPI link on Larrabee from the start or later, it's Intel's decision, but nothing prevents it technically. In that sense, before it risks being pushed into the corner again later, AMD should use the chance while it has it. When Intel gets it, it surely will.