RAID means combining multiple hard disks to gain better performance, or better reliability, or a balance of the two. When the RAID specification was originally proposed in 1987 – in an academic paper by David Patterson, Garth Gibson and Randy Katz at the University of California, Berkeley – the name stood for “redundant array of inexpensive disks”. The emphasis on “inexpensive” reflected a then-massive price difference between personal hard disks and the high-end commercial models, which offered the capacities, reliability and read and write speeds demanded by businesses and universities. RAID made it possible to match or exceed the capabilities of the premium drives – referred to jocularly by the inventors of RAID as “SLED”, for “single large expensive disk” – by combining the power of cheaper domestic units.
Today, RAID is commonly referred to in terms of “independent”, rather than “inexpensive” disks, but this is a rather misleading term: the disks in a RAID array are actually normally interdependent, as we’ll discuss, and from the user’s perspective the array looks and works almost exactly like a single volume, to which files can be read and written, pretty much just like a regular disk.
RAID has many benefits, and if you want to take advantage of them, there are plenty of NAS and USB-based systems – available both pre-populated and bare for you to install your own disks – that will do the job. RAID controllers are commonly built directly into motherboards, too, so you may be able to create an array simply by hooking up two or more internal disks and selecting the appropriate BIOS options (see Creating a motherboard RAID array). If you have a Professional edition of Windows, it’s also possible to create a “software RAID” array with no hardware support at all: see our walkthrough on p89 for instructions.
The simplest way to combine multiple disks is to join them into one “spanned” virtual drive, which looks to the operating system like one disk hosting the combined capacity of the disks involved. This is sometimes called pooling or concatenating the disks.
Strictly speaking, spanned volumes aren’t an application of RAID at all: spanning improves neither performance nor reliability, and it wasn’t mentioned in the original document. But it’s a convenient feature; for example, if your data disk fills up, you can simply add another disk to expand its capacity and go on as before, rather than having to replace it, or have your data spill messily across two volumes. For this reason, many RAID controllers support pooling, sometimes referring to it by the slightly flippant abbreviation JBOD – because your array is “just a bunch of disks”. Many operating systems also let you create spanned drives in software, including Windows Home Server and Windows 8.
Spanned storage comes with one big caveat; one that applies to all multidrive systems. The more disks you’re using, the greater the likelihood that one will fail. This increases your risk of losing at least some data. Depending on how fault-tolerant the hardware or software controlling your JBOD arrangement is, the loss of a single disk could even make the entire array unreadable.
A smarter way to combine disks is striping. Rather than simply concatenating disks – so that the controller fills up the first disk before starting to use the second – striping uses all disks simultaneously. If there are two blocks of data to write to a two-disk array, block two will be written to disk two at the same time as block one is being written to disk one.
In this way, data can be read and written across two drives at twice the speed available from either drive in isolation. Speed can be increased further by simply using more drives, limited only by the transfer speeds supported by the controller and the number of physical disk connectors it offers.
Because of the way striping works, all disks must be the same size – or, at least, if they’re not, you can use only a portion of each disk equal to the size of the smallest in the array.
Another limitation is that, unlike JBOD, a striped array can’t normally be expanded dynamically, as changing the number of disks in use would require previously written data to be redistributed in a different pattern. If you want your array to use three disks, you’ll have to build it that way from the outset.
The biggest problem with striping, however, is that it’s at least as vulnerable to hardware failure as JBOD. A two-disk system is (logically) roughly twice as likely to suffer a hardware failure in a given period than a single drive, and statistical reliability drops further as you add more disks. What’s more, because of the way data is distributed, if any one disk in a striped array fails, the entire contents of the array are lost. Strictly speaking, using specialist disk recovery tools, it might be possible to restore parts of your files. However, since stripes are typically distributed in blocks of only a few kilobytes, you’ve no chance of rescuing files of any substantial size.
Perhaps because of this major shortcoming, simple striping wasn’t included in the original RAID document, but it’s now recognised as a RAID standard. It is, however, considered the lowest “level” of array, dubbed RAID0.
Like striping, mirroring involves combining two or more disks into a single virtual volume. But, in this case a complete copy of your data is written to each disk, figuratively making them all into “mirror images”. Like striping, mirroring assumes disks of identical sizes (otherwise the usable size of each volume will be constrained by the smallest disk). However, the total storage available doesn’t represent the sum of the connected disks, but the capacity of a single one. And performance time is no faster than for a single disk. In fact, due to the overheads of carrying out multiple reads and writes, it can be slower.
But when one of your disks fails, mirroring comes into its own. The broken disk can simply be removed from the array, and you can continue working as usual. You can even replace the failed drive with a new one and have the controller rebuild the array, restoring you to your original level of safety. This is what the “R” in RAID is all about – data safety through redundant copies – and a set of mirrored disks is referred to as a RAID1 array.
Although RAID1 keeps your data safer than a single drive (and much safer than striping), arrays are often constructed from disks of the same type and age, so if one disk does fail, the others could well be on the way out too. If you need to rebuild the array, do it as soon as possible, preferably using a new disk.It’s also important to remember that mirroring isn’t the same as backup, nor a substitute for it. If you delete or overwrite a file on a mirrored volume, it’s gone.
Nested RAID levels
RAID0 and RAID1 may appear incompatible, but it’s possible to combine their advantages by “nesting” RAID levels. For example, you could set up two disks in a striped array, then use another two disks to create a RAID1 mirror of that array. This mirror can then save your bacon if your RAID0 array fails. This particular setup is called, simply enough, RAID0+1, the second number representing the “top” array.
It can be done the other way round, too, creating two sets of mirrored drives and then striping data across the sets. In practice, this is a more popular approach than RAID0+1, since the chances of losing data are lower. To understand why, remember that if one disk in a RAID0 array fails, the rest become useless. Any disk failure therefore breaks the whole mirror – reducing our RAID0+1 array to a single, unmirrored RAID0 array.
In RAID1, conversely, when a disk fails its mirror continues to work, so a RAID1+0 nested array can continue to work even after one disk fails completely. The safety of your data depends on that disk continuing to work, but this is a better bet than a striped array.
Setting up a nested array may sound complex, and if you’re using a domestic motherboard or software RAID, it may not be an option. However, many dedicated RAID controllers and appliances can set up and manage a RAID1+0 array automatically. When striping is used as the top array, to add speed to an established array, the “+” sign is often skipped, so you’ll also see this system referred to as RAID10.
Mathematical fault tolerance
RAID10 is fast and reliable, but it eats up disks. Our example above used four disks to provide the capacity of two; and if you wanted to ensure you weren’t left with a single point of failure following a disk crash, you’d need to add two additional disks. This isn’t economical in terms of hardware, space or power. Happily, it’s possible to achieve data redundancy without deploying disks willy-nilly, by relying instead on a more efficient mathematical system.
The system used by most RAID levels is called parity. This is one of those ingenious ideas that’s simple to explain but far-reaching in its applications. To illustrate how parity works, let’s pluck two eight-bit binary numbers out of the air, such as 00010110 and 10010011. To calculate a parity value for these two bytes we apply a simple XOR function, effectively recording a 0 for bits that are the same in both numbers, and a 1 for bits that are different.
This result is our parity value. On its own it isn’t a number that tells us very much at all. But if we lose either one of the original numbers, we can use this value, along with the surviving byte, to recreate it. Apply this concept to RAID and the benefit is plain: if you have a two-disk striped array, you can use a single extra disk to store parity data – and if any of these three disks fails, you can use the information that remains to rebuild it. It’s some very impressive data robustness without the need for a complete mirror.
Even better, parity isn’t limited to two values. We can derive parity equally well from a set of three, five or 1000 values. All we have to do is record a 0 if the number of 1s in a position is even, and 1 if it’s odd. If any single value is subsequently lost (including the parity value itself), we can recalculate it from what remains. You can thus use a single parity disk to provide redundancy for a three- or four-disk striped array, and rebuild any single disk.
The system we’ve described here isn’t as safe as full mirroring: if a second disk fails before you’ve completely rebuilt the first, the whole array is lost. All the same, a striped array with parity is vastly safer than one without, and appealingly cost-effective.
RAID levels 2 to 4
Almost all standard RAID levels above 1 use parity to provide redundancy. The exception is RAID2, which uses a more advanced system known as Hamming code (after the mathematician Richard Hamming, who invented the mathematical process). Hamming code makes it possible to calculate not only what data was lost when a disk failed, but which particular drive it was that failed – a potentially valuable feature back when RAID was being devised. Modern disk controllers detect drive failures automatically, however, so Hamming code is no longer needed – and it’s a fair bet you’ll never see a RAID2 array in use.
RAID levels 3 and 4 stripe data across two or more disks plus one parity disk, as described above. The difference between them is that RAID3 is striped at the byte level – so a two-byte string would be split across two drives – while RAID4 uses larger blocks. RAID3 is a comparatively efficient way to read and write streams of sequential data, but for consistent performance it requires the use of disks of very similar or identical types. RAID4 suffers less from the use of non-identical disks, and is better able to handle random access, since not all drives in the array are necessarily needed to service small read or write requests.
Yet RAID3 and 4 share one weakness. The whole point of striping is to increase performance, and when it comes to reading data this works fine. The write speed of the array, however, is bottlenecked by the speed at which the parity disk can be accessed and written to.
RAID levels 5 and 6
RAID5 is the most sophisticated system proposed in the original RAID document. It works on exactly the same principle as RAID4, but instead of dedicating one disk to parity, it distributes parity blocks across all of the disks in the system, effectively striping them in with the data. As the original RAID authors explain, “the impact of this small change is large, since RAID level 5 can support multiple individual writes per [stripe] group”. In a four-disk system, it’s theoretically possible to update a data block on disk 1 – and update its associated parity block on disk 2 – while simultaneously doing the same with a different data block on disk 3, whose parity block is on disk 4. In practice, the expected write speed for the array is still slower than RAID0 (or indeed RAID10), making it a less attractive choice for applications such as databases where transactional speed is a priority. But it’s faster than any other RAID level, while still providing the efficiencies of parity-based data redundancy.
Finally, RAID6 builds on the principle of RAID5 by adding an extra level of redundancy, so that even two simultaneous disk failures won’t wreck the array. It does this by adding a second parity disk, and using two parity blocks per stripe. In a four-disk array, RAID6 is no more space efficient than mirroring, but as the size of the array rises it becomes proportionally more attractive: you could use 12 1TB disks to create a 10TB RAID6 array with two levels of redundancy, while the same disks in a RAID10 configuration would yield only 6TB of storage.
RAID6 is also the only RAID level that can detect and correct data corruption, where erroneous data is somehow written to one of the disks. It does this by using different methods to calculate parity in its two parity blocks. In addition to the familiar XOR-based parity, a more complex type of secondary parity calculation is also used. This adds a considerable computational overhead to RAID6, which may explain why it wasn’t included in the original RAID proposal – and why it’s rarely offered by software controllers.
WALKTHROUGH: Setting up a RAID array in Windows
Step 1: Disk operations are carried out in the Disk Management console. Open Control Panel | System and Security, then click “Create and format hard disk partitions” – or type “disk management” into the Start menu search box. You’ll see a window like the one pictured, with assigned volumes at the top and physical disks at the bottom.
Step 2: Right-click on one of the disks you want to use in the new array, and select which type of volume to create (if you’re not using a server edition of the OS, RAID5 will be greyed out). For this example, we’ll combine two unformatted 120GB disks into a RAID0 array for maximum performance, so we select “New Striped Volume…”.
Step 3: A wizard asks which disks to include. Regular NTFS partitions aren’t available, but you can select unused space on a disk that contains another partition, such as the 93MB available on our Disk 0. The capacity of the array is constrained by the smallest disk, however, so we’ll add only the unused Disk 3 to the preselected Disk 2.
Step 4: The wizard then asks you to assign a drive letter to your array, and invites you to format your new disk. You’ll see a warning that the volumes must be converted to “dynamic” disks. This is a special disk format Windows uses for arrays: it can’t be booted from (unless it’s your system drive), and isn’t recognised by other OSes.
Step 5: Another use for RAID is to bring your system drive into a mirrored array, for safety. Attach a second disk of at least the same size. Then, convert your existing drive to a dynamic disk. This only takes a few seconds: you’ll find the option by right-clicking on the information area at the left of the drive map.
Step 6: When you right-click on a dynamic disk (shown in greenish brown), you’ll see the option “Add Mirror…”. You’ll be shown a list of suitable disks: select the volume you want to use and Windows will start synchronising the two drives. This can take many hours, but you can keep using your computer while it happens.