Originally from http://linas.org/linux/Software-RAID/Software-RAID.html
The different RAID levels have different performance, redundancy,
storage capacity, reliability and cost characteristics. Most,
but not all levels of RAID offer redundancy against disk failure.
Of those that offer redundancy, RAID-1 and RAID-5 are the most
popular. RAID-1 offers better performance, while RAID-5 provides
for more efficient use of the available storage space. However,
tuning for performance is an entirely different matter, as performance
depends strongly on a large variety of factors, from the type
of application, to the sizes of stripes, blocks, and files. The
more difficult aspects of performance tuning are deferred to a
later section of this HOWTO.
The following describes the different RAID levels in the context
of the Linux software RAID implementation.
- RAID-linear is a simple concatenation of partitions to create
a larger virtual partition. It is handy if you have a number
small drives, and wish to create a single, large partition.
This concatenation offers no redundancy, and in fact decreases
the overall reliability: if any one disk fails, the combined
partition will fail.
- RAID-1 is also referred to as "mirroring". Two (or
more) partitions, all of the same size, each store an exact
copy of all data, disk-block by disk-block. Mirroring gives
strong protection against disk failure: if one disk fails, there
is another with the an exact copy of the same data. Mirroring
can also help improve performance in I/O-laden systems, as read
requests can be divided up between several disks. Unfortunately,
mirroring is also the least efficient in terms of storage: two
mirrored partitions can store no more data than a single partition.
- Striping is the underlying concept behind all of the other
RAID levels. A stripe is a contiguous sequence of disk blocks.
A stripe may be as short as a single disk block, or may consist
of thousands. The RAID drivers split up their component disk
partitions into stripes; the different RAID levels differ in
how they organize the stripes, and what data they put in them.
The interplay between the size of the stripes, the typical size
of files in the file system, and their location on the disk
is what determines the overall performance of the RAID subsystem.
- RAID-0 is much like RAID-linear, except that the component
partitions are divided into stripes and then interleaved. Like
RAID-linear, the result is a single larger virtual partition.
Also like RAID-linear, it offers no redundancy, and therefore
decreases overall reliability: a single disk failure will knock
out the whole thing. RAID-0 is often claimed to improve performance
over the simpler RAID-linear. However, this may or may not be
true, depending on the characteristics to the file system, the
typical size of the file as compared to the size of the stripe,
and the type of workload. The ext2fs file system already scatters
files throughout a partition, in an effort to minimize fragmentation.
Thus, at the simplest level, any given access may go to one
of several disks, and thus, the interleaving of stripes across
multiple disks offers no apparent additional advantage. However,
there are performance differences, and they are data, workload,
and stripe-size dependent.
- RAID-4 interleaves stripes like RAID-0, but it requires an
additional partition to store parity information. The parity
is used to offer redundancy: if any one of the disks fail, the
data on the remaining disks can be used to reconstruct the data
that was on the failed disk. Given N data disks, and one parity
disk, the parity stripe is computed by taking one stripe from
each of the data disks, and XOR'ing them together. Thus, the
storage capacity of a an (N+1)-disk RAID-4 array is N, which
is a lot better than mirroring (N+1) drives, and is almost as
good as a RAID-0 setup for large N. Note that for N=1, where
there is one data drive, and one parity drive, RAID-4 is a lot
like mirroring, in that each of the two disks is a copy of each
other. However, RAID-4 does NOT offer the read-performance of
mirroring, and offers considerably degraded write performance.
In brief, this is because updating the parity requires a read
of the old parity, before the new parity can be calculated and
written out. In an environment with lots of writes, the parity
disk can become a bottleneck, as each write must access the
parity disk.
- RAID-5 avoids the write-bottleneck of RAID-4 by alternately
storing the parity stripe on each of the drives. However, write
performance is still not as good as for mirroring, as the parity
stripe must still be read and XOR'ed before it is written. Read
performance is also not as good as it is for mirroring, as,
after all, there is only one copy of the data, not two or more.
RAID-5's principle advantage over mirroring is that it offers
redundancy and protection against single-drive failure, while
offering far more storage capacity when used with three or more
drives.
- RAID-2 and RAID-3 are seldom used anymore, and to some degree
are have been made obsolete by modern disk technology. RAID-2
is similar to RAID-4, but stores ECC information instead of
parity. Since all modern disk drives incorporate ECC under the
covers, this offers little additional protection. RAID-2 can
offer greater data consistency if power is lost during a write;
however, battery backup and a clean shutdown can offer the same
benefits. RAID-3 is similar to RAID-4, except that it uses the
smallest possible stripe size. As a result, any given read will
involve all disks, making overlapping I/O requests difficult/impossible.
In order to avoid delay due to rotational latency, RAID-3 requires
that all disk drive spindles be synchronized. Most modern disk
drives lack spindle-synchronization ability, or, if capable
of it, lack the needed connectors, cables, and manufacturer
documentation. Neither RAID-2 nor RAID-3 are supported by the
Linux Software-RAID drivers.
- Other RAID levels have been defined by various researchers
and vendors. Many of these represent the layering of one type
of raid on top of another. Some require special hardware, and
others are protected by patent. There is no commonly accepted
naming scheme for these other levels. Sometime the advantages
of these other systems are minor, or at least not apparent until
the system is highly stressed. Except for the layering of RAID-1
over RAID-0/linear, Linux Software RAID does not support any
of the other variations.
|