Originally from http://linas.org/linux/Software-RAID/Software-RAID.html
RAID is a way of combining multiple disk drives into a single
entity to improve performance and/or reliability. There are a
variety of different types and implementations of RAID, each with
its own advantages and disadvantages. For example, by putting
a copy of the same data on two disks (called disk mirroring,
or RAID level 1), read performance can be improved by reading
alternately from each disk in the mirror. On average, each disk
is less busy, as it is handling only 1/2 the reads (for two disks),
or 1/3 (for three disks), etc. In addition, a mirror can improve
reliability: if one disk fails, the other disk(s) have a copy
of the data. Different ways of combining the disks into one, referred
to as RAID levels, can provide greater storage efficiency
than simple mirroring, or can alter latency (access-time) performance,
or throughput (transfer rate) performance, for reading or writing,
while still retaining redundancy that is useful for guarding against
failures.
Although RAID can protect against disk failure, it does not
protect against operator and administrator (human) error, or against
loss due to programming bugs (possibly due to bugs in the RAID
software itself). The net abounds with tragic tales of system
administrators who have bungled a RAID installation, and have
lost all of their data. RAID is not a substitute for frequent,
regularly scheduled backup.
RAID can be implemented in hardware, in the form of special disk
controllers, or in software, as a kernel module that is layered
in between the low-level disk driver, and the file system which
sits above it. RAID hardware is always a "disk controller",
that is, a device to which one can cable up the disk drives. Usually
it comes in the form of an adapter card that will plug into a
ISA/EISA/PCI/S-Bus/MicroChannel slot. However, some RAID controllers
are in the form of a box that connects into the cable in between
the usual system disk controller, and the disk drives. Small ones
may fit into a drive bay; large ones may be built into a storage
cabinet with its own drive bays and power supply. The latest RAID
hardware used with the latest & fastest CPU will usually provide
the best overall performance, although at a significant price.
This is because most RAID controllers come with on-board DSP's
and memory cache that can off-load a considerable amount of processing
from the main CPU, as well as allow high transfer rates into the
large controller cache. Old RAID hardware can act as a "de-accelerator"
when used with newer CPU's: yesterday's fancy DSP and cache can
act as a bottleneck, and it's performance is often beaten by pure-software
RAID and new but otherwise plain, run-of-the-mill disk controllers.
RAID hardware can offer an advantage over pure-software RAID,
if it can makes use of disk-spindle synchronization and its knowledge
of the disk-platter position with regard to the disk head, and
the desired disk-block. However, most modern (low-cost) disk drives
do not offer this information and level of control anyway, and
thus, most RAID hardware does not take advantage of it. RAID hardware
is usually not compatible across different brands, makes and models:
if a RAID controller fails, it must be replaced by another controller
of the same type. As of this writing (June 1998), a broad variety
of hardware controllers will operate under Linux; however, none
of them currently come with configuration and management utilities
that run under Linux.
Software-RAID is a set of kernel modules, together with management
utilities that implement RAID purely in software, and require
no extraordinary hardware. The Linux RAID subsystem is implemented
as a layer in the kernel that sits above the low-level disk drivers
(for IDE, SCSI and Paraport drives), and the block-device interface.
The filesystem, be it ext2fs, DOS-FAT, or other, sits above the
block-device interface. Software-RAID, by its very software nature,
tends to be more flexible than a hardware solution. The downside
is that it of course requires more CPU cycles and power to run
well than a comparable hardware system. Of course, the cost can't
be beat. Software RAID has one further important distinguishing
feature: it operates on a partition-by-partition basis, where
a number of individual disk partitions are ganged together to
create a RAID partition. This is in contrast to most hardware
RAID solutions, which gang together entire disk drives into an
array. With hardware, the fact that there is a RAID array is transparent
to the operating system, which tends to simplify management. With
software, there are far more configuration options and choices,
tending to complicate matters.
As of this writing (June 1998), the administration of RAID
under Linux is far from trivial, and is best attempted by experienced
system administrators. The theory of operation is complex. The
system tools require modification to startup scripts. And recovery
from disk failure is non-trivial, and prone to human error. RAID
is not for the novice, and any benefits it may bring to reliability
and performance can be easily outweighed by the extra complexity.
Indeed, modern disk drives are incredibly reliable and modern
CPU's and controllers are quite powerful. You might more easily
obtain the desired reliability and performance levels by purchasing
higher-quality and/or faster hardware.
|