RAID means combining multiple hard disks to gain better performance, or better reliability, or a balance of the two. When the RAID specification was originally proposed in 1987 – in an academic paper by David Patterson, Garth Gibson and Randy Katz at the University of California, Berkeley – the name stood for “redundant array of inexpensive disks”. The emphasis on “inexpensive” reflected a then-massive price difference between personal hard disks and the high-end commercial models, which offered the capacities, reliability and read and write speeds demanded by businesses and universities. RAID made it possible to match or exceed the capabilities of the premium drives – referred to jocularly by the inventors of RAID as “SLED”, for “single large expensive disk” – by combining the power of cheaper domestic units.
Today, RAID is commonly referred to in terms of “independent”, rather than “inexpensive” disks, but this is a rather misleading term: the disks in a RAID array are actually normally interdependent, as we’ll discuss, and from the user’s perspective the array looks and works almost exactly like a single volume, to which files can be read and written, pretty much just like a regular disk.
RAID has many benefits, and if you want to take advantage of them, there are plenty of NAS and USB-based systems – available both pre-populated and bare for you to install your own disks – that will do the job. RAID controllers are commonly built directly into motherboards, too, so you may be able to create an array simply by hooking up two or more internal disks and selecting the appropriate BIOS options (see Creating a motherboard RAID array). If you have a Professional edition of Windows, it’s also possible to create a “software RAID” array with no hardware support at all: see our walkthrough on p89 for instructions.
The simplest way to combine multiple disks is to join them into one “spanned” virtual drive, which looks to the operating system like one disk hosting the combined capacity of the disks involved. This is sometimes called pooling or concatenating the disks.
Strictly speaking, spanned volumes aren’t an application of RAID at all: spanning improves neither performance nor reliability, and it wasn’t mentioned in the original document. But it’s a convenient feature; for example, if your data disk fills up, you can simply add another disk to expand its capacity and go on as before, rather than having to replace it, or have your data spill messily across two volumes. For this reason, many RAID controllers support pooling, sometimes referring to it by the slightly flippant abbreviation JBOD – because your array is “just a bunch of disks”. Many operating systems also let you create spanned drives in software, including Windows Home Server and Windows 8.
Spanned storage comes with one big caveat; one that applies to all multidrive systems. The more disks you’re using, the greater the likelihood that one will fail. This increases your risk of losing at least some data. Depending on how fault-tolerant the hardware or software controlling your JBOD arrangement is, the loss of a single disk could even make the entire array unreadable.
A smarter way to combine disks is striping. Rather than simply concatenating disks – so that the controller fills up the first disk before starting to use the second – striping uses all disks simultaneously. If there are two blocks of data to write to a two-disk array, block two will be written to disk two at the same time as block one is being written to disk one.
In this way, data can be read and written across two drives at twice the speed available from either drive in isolation. Speed can be increased further by simply using more drives, limited only by the transfer speeds supported by the controller and the number of physical disk connectors it offers.
Because of the way striping works, all disks must be the same size – or, at least, if they’re not, you can use only a portion of each disk equal to the size of the smallest in the array.
Another limitation is that, unlike JBOD, a striped array can’t normally be expanded dynamically, as changing the number of disks in use would require previously written data to be redistributed in a different pattern. If you want your array to use three disks, you’ll have to build it that way from the outset.
The biggest problem with striping, however, is that it’s at least as vulnerable to hardware failure as JBOD. A two-disk system is (logically) roughly twice as likely to suffer a hardware failure in a given period than a single drive, and statistical reliability drops further as you add more disks. What’s more, because of the way data is distributed, if any one disk in a striped array fails, the entire contents of the array are lost. Strictly speaking, using specialist disk recovery tools, it might be possible to restore parts of your files. However, since stripes are typically distributed in blocks of only a few kilobytes, you’ve no chance of rescuing files of any substantial size.
Perhaps because of this major shortcoming, simple striping wasn’t included in the original RAID document, but it’s now recognised as a RAID standard. It is, however, considered the lowest “level” of array, dubbed RAID0.