RAID Theory

RAID Theory
Page 1 of 4  |  Single page

Also RAID setup, RAID benchmarks, RAID considerations, RAID software, RAID hardware...

These days RAID really should be called Redundant Array of Expensive Disks (aka RAED; got net speak?), since it’s people like you and I who frequently take already high-performing disks and improve our storage subsystem performance even more by RAIDing two or more of them.

And of course, this means RAID 0 is the favourite among enthusiasts for its focus on performance, not redundancy.

But as with just about everything else to do with technology, there’s more to RAID 0 than first meets the eye, and in fact your current array could well be under performing if you haven’t build it off the principles of RAID theory, which is precisely what we’ll cover here.

Parallelism and workload
RAID is all about parallelism – for every drive you add, you can conceptually increase the performance of a storage subsystem by splitting the workload among multiple drives at once. This can reap performance benefits as in RAID 0, or redundancy benefits as in RAID 1, or a mixture of the two through RAID levels 3 to 7 (not to mention ‘tiered’ RAIDs, such as 0+1).

But parallelism isn’t just about RAID. Having two drives in a system, with files and directories split across them, can deliver performance benefits (ergo why it’s often recommended to put the swap file on a separate drive from the OS). Separate spindles, as we’ll call it, gives your machine the ability to service multiple I/O requests at once – assuming the files it needs can be found on both drives.

Keeping this in mind, designing a good performance subsystem with RAID is as much about the RAID level and system workload as it is about your storage layout, which we’ll get onto a little later.

This leads us to something we’ll be mentioning a lot – workload. It’s important to remember RAID is not a silver bullet. It won’t exponentially increase the performance of your system across the board; it can only increase the performance of certain workloads, and like a sliding scale, as you increase the performance of one workload you decrease the performance of another.

Since the performance of a hard drive is largely defined by its seek times and transfer rate, what you do with your machine will determine how well a hard drive can deliver – a hard drive with a low seek time will perform better at frequent random accesses, the type that occurs when you load Windows or launch an application. And a drive with a high transfer rate will perform better at tasks like loading large files, or streaming video.

All hard drives obviously do both of these workloads to varying degrees, but one of the side effects of using RAID is that you magnify these differences, and the effectiveness of an array depends on the type of workload it’s going to get. If you build an array designed for throughput, it won’t perform as well for seek based workloads – and if this is what your machine does most of the time, you won’t see a good return for your investment. In fact, as we’ll show, for the wrong workload a RAID array can perform slower than a single drive.

Seek vs throughput
To understand how this can be it helps to understand the many factors that can influence an array. From the operating system, file system, and drivers through to RAID level, RAID hardware, and individual drive performance there are many variables at play. You can’t control all of them, but you can optimise an array for your particular setup.

Since this is Atomic, we’ll be focusing on just RAID 0, but even here there is plenty to explore. We’ll also assume you like to use identical makes and models of drives for your arrays, because this is simply the smart thing to do.

As covered above, RAID isn’t a one size fits all solution – by nature it can provide excellent performance improvements, providing the array is built with your most common workload in mind. The easiest way to categorise your workload is to look at what you use your PC for.

For example, at the extreme ends the type of workload a system may see are frequent random accesses (which we’ll abbreviate as seek), and sustained sequential throughput (commonly called sustained transfer rate). The latter is often exemplified by a machine that may be used in video editing or streaming – large, contiguous, files. The former is frequently represented by common operating system use – lots of small files being accessed at different times. If you’re wondering, games usually fall between the two, something you’ll be able to see in the benchmark results to follow.

Many enthusiasts optimise their arrays for raw throughput, putting the benchmarks of programs like ATTO and SiSoft Sandra on a pedestal – but this is a mistake. Inevitably, an array that performs well at throughput doesn’t perform as well at frequent random accesses – the workloads that require frequent seeking – which just happens to be the workload of their machines when they’re actually using them and not running benchmarks.

To demonstrate this we’ll be benchmarking using two high-end drive models – the 10,000 RPM Raptor, and the new 32M cache Seagate 7200.11. The Raptor isn’t as fast as the Seagate for throughput, but its 10,000 RPM spindle speed gives it a faster seek time. So which will be better suited to your workloads? And will both perform in RAID-0?

Next Page
1  |  2  |  3  |  4  |  Single page
This feature appeared in the May, 2008 issue of Atomic Magazine

Most Read Articles

Head2Head: Apple iPhone X vs Samsung Galaxy S8

Head2Head: Apple iPhone X vs Samsung Galaxy S8

The ACCC investigation into the NBN will be useful...

The ACCC investigation into the NBN will be useful...

How to: Delete your Google history

How to: Delete your Google history

Review: HP Spectre x2 (2017)

Review: HP Spectre x2 (2017)

Would you like to receive

Our Newsletter?