Subject: What is the best way to configure Software RAID? Under Linux Type: FAQ Last Revision Date: 23-AUG-2000 ORACLE PRODUCT/PRODUCT GROUP ------------------------------ Unix/LINUX ---------------- FREQUENTLY ASKED QUESTIONS -------------------------- 16-APR-2000 CONTENTS -------- RAID CONFIGURATION FOR LINUX PLATFORM. QUESTIONS & ANSWERS ------------------- 1. What is the best way to configure Software RAID? Answer ------ I keep rediscovering that file-system planning is one of the more difficult Unix configuration tasks. To answer your question, I can describe what we did. We planned the following setup: two EIDE disks, 2.1.gig each. disk partition mount pt. size device 1 1 / 300M /dev/hda1 1 2 swap 64M /dev/hda2 1 3 /home 800M /dev/hda3 1 4 /var 900M /dev/hda4 2 1 /root 300M /dev/hdc1 2 2 swap 64M /dev/hdc2 2 3 /home 800M /dev/hdc3 2 4 /var 900M /dev/hdc4 Each disk is on a separate controller (& ribbon cable). The theory is that a controller failure and/or ribbon failure won't disable both disks. Also, we might possibly get a performance boost from parallel operations over two controllers/cables. Install the Linux kernel on the root (/) partition /dev/hda1. Mark this partition as bootable. /dev/hdc1 will contain a ``cold'' copy of /dev/hda1. This is NOT a raid copy, just a plain old copy-copy. It's there just in case the first disk fails; we can use a rescue disk, mark /dev/hdc1 as bootable, and use that to keep going without having to reinstall the system. You may even want to put /dev/hdc1's copy of the kernel into LILO to simplify booting in case of failure. The theory here is that in case of severe failure, I can still boot the system without worrying about raid superblock-corruption or other raid failure modes & gotchas that I don't understand. /dev/hda3 and /dev/hdc3 will be mirrors /dev/md0. /dev/hda4 and /dev/hdc4 will be mirrors /dev/md1. we picked /var and /home to be mirrored, and in separate partitions, using the following logic: / (the root partition) will contain relatively static, non-changing data: for all practical purposes, it will be read-only without actually being marked & mounted read-only. /home will contain ''slowly'' changing data. /var will contain rapidly changing data, including mail spools, database contents and web server logs. The idea behind using multiple, distinct partitions is that if, for some bizarre reason, whether it is human error, power loss, or an operating system gone wild, corruption is limited to one partition. In one typical case, power is lost while the system is writing to disk. This will almost certainly lead to a corrupted filesystem, which will be repaired by fsck during the next boot. Although fsck does it's best to make the repairs without creating additional damage during those repairs, it can be comforting to know that any such damage has been limited to one partition. In another typical case, the sysadmin makes a mistake during rescue operations, leading to erased or destroyed data. Partitions can help limit the repercussions of the operator's errors. Other reasonable choices for partitions might be /usr or /opt. In fact, /opt and /home make great choices for RAID-5 partitions, if we had more disks. A word of caution: DO NOT put /usr in a RAID-5 partition. If a serious fault occurs, you may find that you cannot mount /usr, and that you want some of the tools on it (e.g. the networking tools, or the compiler.) With RAID-1, if a fault has occurred, and you can't get RAID to work, you can at least mount one of the two mirrors. You can't do this with any of the other RAID levels (RAID-5, striping, or linear append). So, to complete the answer to the question: install the OS on disk 1, partition 1. do NOT mount any of the other partitions. install RAID per instructions. *configure md0 and md1. *convince yourself that you know what to do in case of a disk failure! Discover sysadmin mistakes now, and not during an actual crisis. *Experiment! (we turned off power during disk activity — this proved to be ugly but informative). *do some ugly mount/copy/unmount/rename/reboot scheme to move /var over to the /dev/md1. Done carefully, this is not dangerous. enjoy! 2. What is the difference between the mdadd, mdrun, etc. commands, and the raidadd, raidrun commands? Answer ------ The names of the tools have changed as of the 0.5 release of the raidtools package. The md naming convention was used in the 0.43 and older versions, while raid is used in 0.5 and newer versions. 3. I have heard that I can run mirroring over striping. Is this true? Can I run mirroring over the loopback device? Answer ------ Yes, but not the reverse. That is, you can put a stripe over several disks, and then build a mirror on top of this. However, striping cannot be put on top of mirroring. A brief technical explanation is that the linear and stripe personalities use the ll_rw_blk routine for access. The ll_rw_blk routine maps disk devices and sectors, not blocks. Block devices can be layered one on top of the other; but devices that do raw, low-level disk accesses, such as ll_rw_blk, cannot 4. What is the difference between RAID-1 and RAID-5 for a two-disk configuration (i.e. the difference between a RAID-1 array built out of two disks, and a RAID-5 array built out of two disks)? Answer ------ There is no difference in storage capacity. Nor can disks be added to either array to increase capacity (see the question below for details). RAID-1 offers a performance advantage for reads: the RAID-1 driver uses distributed-read technology to simultaneously read two sectors, one from each drive, thus doubling read performance. The RAID-5 driver, although it contains many optimizations, does not currently realize that the parity disk is actually a mirrored copy of the data disk. Thus, it serializes data reads. 5. How can I guard against a two-disk failure? Answer ------ Some of the RAID algorithms do guard against multiple disk failures, but these are not currently implemented for Linux. However, the Linux Software RAID can guard against multiple disk failures by layering an array on top of an array. For example, nine disks can be used to create three raid-5 arrays. Then these three arrays can in turn be hooked together into a single RAID-5 array on top. In fact, this kind of a configuration will guard against a three-disk failure. Note that a large amount of disk space is ''wasted'' on the redundancy information. For an NxN raid-5 array, N=3, 5 out of 9 disks are used for parity (=55%) N=4, 7 out of 16 disks N=5, 9 out of 25 disks ... N=9, 17 out of 81 disks (=~20%) In general, an MxN array will use M+N-1 disks for parity. The least amount of space is "wasted" when M=N. Another alternative is to create a RAID-1 array with three disks. Note that since all three disks contain identical data, that 2/3's of the space is ''wasted''.