Graphics Superguide: GeForce GTX200, CUDA, Dunia, Far Cry 2, S.T.A.L.K.E.R Clear Sky
Staff writers
|
Jul 23, 2008 11:55 AM
Nvidia’s latest graphics cards and chips throw down the gauntlet to Intel and AMD. Go behind the scenes with this guide to the big technologies including PhysX, CUDA, and games like Far Cry 2 and S.T.A.L.K.E.R Clear Sky.
To hear Nvidia tell it, integrated graphics just aren’t going to cut it, and a discrete GPU is still vital. They point to the 87% of top PC games with a recommended spec above the Intel integrated graphics specification to support their claim.
More than ever, a CPU and GPU work in concert, so that an optimised configuration of 256MB GeForce card and dual core processor will outperform a quad core with a 128MB GeForce card. In other words, the GPU doesn’t have to limit itself to gaming, and that’s where a whole raft of new initiatives from Nvidia step in to do a polished song and dance number.
A simple example of how the GPU can go beyond gaming is a little app called PicLens, by Cooliris, that displays Google, Flickr, Youtube and Deviantart image searches as an interactive 3D wall that you can visually skim, pause, play a video thumbnail, or flick back and forth within coverflow-style. A GPU adds piclens motion blur and antialiasing, as well as much more power.
GeForce GTX200 and beyond
The GeForce GTX200 series, launched mid- June, incorporate the second generation unified architecture from Nvidia, but they are also parallel processors with 1.4 billion transistors, providing just under a teraflop of power from 240 processor cores. The first two cards to be released – the GeForce GTX260 and GTX280 – won’t be cheap, but from what we’ve seen, they’re immensely powerful.
 |
| Far Cry 2 uses a new engine called Dunia, designed to take advantage of the new GTX200 cards. (click image to enlarge) |
Tony Tomasi, Nvidia’s Vice President of Technical Marketing says it’s the largest, most powerful and most complex GPU ever made by chip manufacturer TSMC. Its complexity is exemplified by its two distinct modes; one dedicated to computation, and the other to graphics processing.
Around 80% of the GPU is dedicated to parallel computation, and the processor is designed to maximise throughput. Each of the 240 single-instruction, multiple thread (SIMT) cores is scalable and can communicate on-die, rather than having to go out to the memory system. Eight cores are grouped into a streaming multiprocessor with 16KB shared memory. That shared local memory is available to the programmer, so that the GPU can be optimised for different tasks. Three of those multiprocessors, together with L1 cache, creates an array, and there are 10 arrays that make up the GPU, along with a thread scheduler to manage the threads, and a 512bit memory subsystem.
Curtis Beeson, engineer at Nvidia, demonstrated the second personality via a graphics showcase. The latest iteration is a story-based demo, featuring a warrior facing down a Medusa (and coming to a stony end). The key features for the GTX200 series processors are new lighting effects, more photorealism, more than three million triangles per frame, improved DirectX 10 features such as geometry shading, and – in the demo we saw – hardware-generated petrification and transformation effects.
 |
| The GeForce GTX280 may look unassuming, but it packs a powerful punch |
Tony Tamasi says that for graphics processing, the same basic elements that make up the parallel processor then have, in addition, a variety of specialised shaders, improved texture performance, a 1GB frame buffer and increased shader to texture ratio – all of which should make for cinematic quality gaming. Tomasi says Nvidia is aiming to balance shading and textures with floating point detection: “Focusing on one without the other can lead to awesomely fast DirectX 9 performance, but no real improvement for DirectX 10, so we balance it.”
Another thing Nvidia has been working on is power efficiency, trying to ensure that when a feature isn’t needed by the GPU, it uses no power at all. The GTX200 series, as a result, has more gradations of power available, so that the cards consume about 25W when idle, 32W while playing a Blu-ray disc, and 147W while running an intensive benchmark such as 3DMark06. For comparison, the GeForce 9800GTX uses around 45W while idle, 50W for Blu-ray and 80W for 3DMark06.
Tomasi also points out that 25W usage while idle isn’t too much more than the motherboard GPU generally uses. “If we can get our power low enough,” he said, “then you’ll get to a point where the discrete GPU uses less power than the motherboard GPU.”
The games to come
Nvidia acquired PhysX only 4 months ago, but within a month PhysX was running on GeForce, and it’s now incorporated into the new GTX 200 series GPUs.
PhysX is currently the only API that runs on both CPUs and GPUs, and it’s programmable using CUDA (see opposite). For PhysX, being part of Nvidia has meant a massive increase in the number of games signing up – more in a single month than in the previous two years as Ageia. For Nvidia, it means they can offer more to game designers and level designers. In the works are tools that increase the consistency between the modelling environment and the final game engine, and to help the creation of in-game objects and behaviours. This should all lead to richer games, even from smaller studios without massive design budgets. The first drivers porting across to the GeForce will be for the Unreal Engine, so if you run games based on that engine, you should see the influence of PhysX straight away on GeForce GTX200 series graphics cards.
The goals for the team behind Far Cry 2 is to not just have great static screenshots, but also to have the best looking dynamic beauty. The new installment is set in Africa, with lots of exterior environs and unlike most games, you really can go anywhere. Everywhere within the game is high resolution as you step up close to it – not just the plot-related areas.
There’s a new engine – Dunia – which the developers describe as being ‘kickass enough’ for the environment they want to create and they intend that Far Cry 2 will be the first of many games to use it. The demo we saw at Nvidia Editor’s Day on the GTX280 showed fabulously high resolution, high frame-rate, high-quality gameplay. There’s not just a full world: there’s also weather, 24-hour changes over four hours of game time, levels of intersecting shadows in the environment and independent behaviours for fire, trees and movements. Everything is animated, rather than programmed, and it looks amazing.
 |
| The goal for the Far Cry 2 team is not just to have great static screenshots, but also to have the best looking dynamic beauty. (Click image to enlarge) |
Other games we saw showed off aspects of the new PhysX inclusion, Morpheme – which allows completely interactive tackling in American Football game Backbreaker, for example, as each player behaves independently. RealTime Worlds, but the makers of Grand Theft Auto, looks to be particularly ambitious, boasting thousands of simultaneously physical objects, sychronised to the computers of millions of players around the world, each with independent behaviour, so if you kick a can on your screen, it’ll richochet through someone elses, too.
S.T.A.L.K.E.R Clear Sky showed off improved, more realistic shadows and dynamic wetting as well as incredibly realistic volumetric smoke and lighting in its demo.
Tegra – perpetual motion machine anyone?
We had a sneak preview of Nvidia’s new low-power platform Tegra at the Nvidia Editor’s day, fitted into the shell of a 12in laptop, showing a 720p video on-screen. The whole operation consumed three Watts – which Nvidia claims is around 10% that used by the new Atom-based Eee PC. The Tegra is built with portable devices in mind, much like Intel’s Atom, but where Intel opted for a CPU, Tegra has an inbuilt CPU, GeForce GPU and controllers for all other core operations in just 144mm2.
Tiny size doesn’t mean tiny performance, though. Both models of Tegra code and decode 720p for up to 30 hours of playback, play Quake 3 at playable framerates or play up to 130 hours of audio. The Tegra 650 can also play 10 hours of 1080p on a single battery charge.
Nvidia is planning Tegra II and Tegra III over the next couple of years to continue meeting consumer expectations of power and energy efficiency. By early 2009, we’re likely to see Nvidia’s first Tegra-based Eee-killer, sporting the next generation of Windows Mobile operating systems.
CUDA – the powerhouse behind the chip
Over the last 15 years Nvidia has focused on the graphics pipeline, but more recently its been concentrating on programmability to extend the use of the GPU beyond gaming. GPGPU (General-Purpose computation on GPUs) started out in universities, using the Cg programming language to program shaders and run programs deep inside the graphics pipeline. CUDA lets you write the same kind of program and run it outside of the graphics pipeline. That meant that it had applications outside gaming, for computational methods and database management.
Not only that, but CUDA’s programming environment can control both CPU and GPU cores for maximum processing power. CUDA is included with everything Nvidia ships, from GeForce through to professional level Quadro and Tesla GPUs, so developers can work on a laptop before porting the application to a larger scale.
If you have a series 8 GeForce GPU, it’s CUDA capable – giving you a free processor with your GPU. Over 60,000 people are using CUDA worldwide in just that manner.
For scientists, it’s meant that programs and tools run 100 times faster. An example of CUDA’s impact is The US National Centre for Atmospheric Research, which used CUDA to trim a week off the month-long weather research and forecast calculations (used to predict the weather 4-5 days in advance).
The programmable graphics of CUDA also has applications in future gaming. Traditonally, GPUs can be used to render and simulate complex light scattering, including subsurface scattering, to create very realistic shapes and surfaces.
However, with most of the traditional rendering techniques, objects like a hairball are very difficult to create because of the interaction of light with complex geometry, and because of shadows. To recreate that effect requires very small pieces of geometry that are very time-consuming to generate.
Nvidia’s view is that the next generation of high quality rendering will mix APIs and programming with CUDA and other C/C++ languages using rasterisation and ray tracing. Nvidia is putting a lot of money into ray tracing – in particular, it acquired University of Utah spinoff RayScale as part of its plans. The downside is that raytracing is computationally intensive, and until recently GPUs couldn’t manage it. In the envisaged scenario, the GPU does the rendering and physical simulation – the parallel supercomputer doing its work – while lighting and reflections are handled by raytracing.
We were showed a demo of a car and plane created entirely on GPU, with first pass all done with rasteriser and all reflections done using a raytracer coded in CUDA. The demo included interobject reflections, which gaming engines can’t do, but ray-tracing can. Nvidia aim to enable real time rendering and ray-tracing in their next generation processors.
The types of performance improvement that CUDA can add are useful even for desktop applications, such as transcoding HD video to H.264 for portable video. Nvidia claims, for example, that a 2hr HD movie transcode takes 10 hours using 1.6 GHz dual-core and integrated graphics, 5hr 33min with a 3GHz quad-core and integrated graphics, but only 35mins using a 1.6GHz dual-core and a GeForce GTX280.
The exciting range of upcoming games, and the ability to speed up video and audio encoding are just a few of the areas where we’ll reap rewards in the near future.
This article appeared in the
August, 2008 issue of PC Authority.