Might I suggest you rename this to: My Random Rant About RAID (you don't
seem to be sticking to your schedule...) ;-)

"Art S. Kagel" <kagelbloomberg.net> wrote in message
news:pan.2003.07.31.16.49.22.318281.15473bloomber g.net...
> RAID5 versus RAID10 (or even RAID3 or RAID4)
>
> OK here is the deal, RAID5 uses ONLY ONE parity drive per stripe and many
> RAID5 arrays are 5 drives (4 data and 1 parity though it is not a single
> drive that is holding all of the parity as in RAID 3 & 4 but read on). If
> you have 10 drives or say 20GB each for 200GB RAID5 will use 20% for
> parity so you will have 160GB of storage. Now since RAID10, like
> mirroring (RAID1), uses 1 mirror drive for each primary drive you are
> using 50% for redundancy so to get the same 160GB of storage you will need
> 8 pairs or 16 - 20GB drives, which is why RAID5 is so popular. This intro
> is just to put things into perspective .
>
> RAID5 is physically a stripe set like RAID0 but with data recovery
> included. RAID5 reserves one disk block out of each stripe block for
> parity data. The parity block contains an error correction code which can
> correct any error in the RAID5 block, in effect it is used in combination
> with the remaining data blocks to recreate any single missing block, gone
> missing because a drive has failed. The innovation of RAID5 over RAID3 &
> RAID4 is that the parity is distributed on a round robin basis so that
> there can be independent reading of different blocks from the several
> drives. This is why RAID5 became more popular than RAID3 & RAID4. So, if
> Drive2 fails blocks 1,2,4,5,6 &7 are data blocks on this drive and blocks
> 3 and 8 are parity blocks on this drive. So that means that the parity on
> Drive5 will be used to recreate the data block from Disk2 if block 1 is
> requested before a new drive replaces Drive2 or during the rebuilding of
> the new Drive2 replacement. Likewise the parity on Drive1 will be used to
> repair block 2 and the parity on Drive3 will repair block4, etc. For
> block 2 all the data is safely on the remaining drives but during the
> rebuilding of Drive2's replacement a new parity block will be calculated
> from the block 2 data and will be written to Drive 2.
>
> Now when a disk block is read from the array the RAID software/firmware
> calculates which RAID block contains the disk block, which drive the disk
> block is on and which drive contains the parity block for that RAID block
> and reads ONLY the data drive. It returns the data block . If you later
> modify the data block it recalculates the parity by subtracting the old
> block and adding in the new version then in two separate operations it
> writes the data block followed by the new parity block. To do this it
> must first read the parity block from whichever drive contains the parity
> for that stripe block and reread the unmodified data for the updated block
> from the original drive. This read-read-write-write is known as the RAID5
> write penalty since these two writes are sequential and synchronous the
> write system call cannot return until the reread and both writes complete,
> for safety, so writing to RAID5 is up to 50% slower than RAID0 for an
> array of the same capacity.
>
> Now RAID10:
>
> RAID10 is one of the combinations of RAID1 (mirroring) and RAID0
> (striping) which are possible. When N mirrored pairs are striped together
> this is called RAID10 because the mirroring is applied first. The other
> option is to create two stripe sets and mirror them one to the other, this
> is known as RAID01. In either a RAID01 or RAID10 system each and every
> disk block is completely duplicated on its drive's mirror.
> Performance-wise both RAID01 and RAID10 are functionally equivalent. The
> difference comes in during recover where RAID01 suffers from some of the
> same problems I will describe affecting RAID5 while RAID10 does not.
>
> Now if a drive in the RAID5 array dies, is removed or is shut off data is
> returned by reading the blocks from the remaining drives and calculating
> the missing data using the parity, assuming the defunct drive is not the
> parity block drive for that RAID block. Note that it takes 4 physical
> reads to replace the missing disk block (for a 5 drive array) for four out
> of every five disk blocks.
>
> If a drive in the RAID10 array dies data is returned from its mirror drive
> in a single read.
>
> One begins to get an inkling of what is going on.
>
> OK, so that brings us to the final question of the day which is: What is
> the problem with RAID5? It does recover a failed drive right? So writes
> are slower, I don't do enough writing to worry about it and the cache
> helps a lot also, I've got LOTS of cache! The problem is that despite the
> improved reliability of modern drives and the improved error correction
> codes on most drives, and even despite the additional 8 bytes of error
> correction that EMC puts on every Clariion drive disk block, it is more
> than a little possible that a drive will become flaky and begin to return
> garbage. This is known as partial media failure. Now SCSI controllers
> reserve several hundred disk blocks to be remapped to replace fading
> sectors with unused ones, but if the drive is going these will not last
> very long and will run out and SCSI does NOT report correctable errors
> back to the OS! Therefore you will not know the drive is becoming
> unstable until it is too late and there are no more replacement sectors
> and the drive begins to return garbage. When this happens, since RAID5
> does not EVER check parity on read (RAID3 & RAID4 do BTW and both perform
> better for databases than RAID5 to boot) when you write the garbage sector
> back garbage parity will be calculated and your RAID5 integrity is lost!
> Similarly if a drive fails and one of the remaining drives is flaky the
> replacement will be rebuilt with garbage also.
>
> Need more? During recovery read performance for a RAID5 array is degraded
> by as much as 80%. Some advanced arrays let you configure the preference
> more toward recovery or toward performance. However, doing so will
> increase recovery time and increase the likelihood of losing a second
> drive in the array before recovery completes resulting in catastrophic
> data loss. RAID10 on the other hand will only be recovering one drive out
> of 4 or more pairs with performance ONLY of reads from the recovering pair
> degraded making the performance hit to the array overall only about 20%!
> Plus there is no parity calculation time used during recovery - it's a
> straight data copy.
>
> What about that thing about losing a second drive? Well with RAID10 there
> is no danger unless the one mirror that is recovering also fails and
> that's 80% or more less likely than that any other drive in a RAID5 array
> will fail! And since most multiple drive failures are caused by
> undetected manufacturing defects you can make even this possibility
> vanishingly small by making sure to mirror every drive with one from a
> different manufacturer's lot number. ("Oh", you say, "this schenario does
> not seem likely!" Pooh, we lost 50 drives in two weeks when a batch of
> 200 IBM drives began to fail. IBM discovered that the single lot of
> drives would have their spindle bearings freeze after so many hours of
> operation. Fortunately due in part to RAID10 and in part to a herculean
> effort by DG techs and our own people over 2 weeks no data was lost.
> HOWEVER, one RAID5 filesystem was a total loss after a second drive failed
> during recover. Fortunately everything was on tape.
>
> Conclusion? For safety and performance favor RAID10 first, RAID3 second,
> RAID4 third, and RAID5 last! The original reason for the RAID2-5 specs
> was that the high cost of disks was making RAID1, mirroring, impractical.
> That is no longer the case! Drives are commodity priced, even the biggest
> fastest drives are cheaper in absolute dollars than drives were then and
> cost per MB is a tiny fraction of what it was. Does RAID5 make ANY sense
> anymore? Obviously I think not.
>
> To put things into perspective: If a drive costs $1000US then switching
> from a 4 pair RAID10 array to a 5 drive RAID5 array will save 3 drives or
> $3000US. What is the cost of overtime, wear and tear on the technicians,
> DBAs, managers, and customers of even a recovery scare? What is the cost
> of reduced performance and possibly reduced customer satisfaction? Finally
> what is the cost of lost business if data is unrecoverable? I maintain
> that the drives are FAR cheaper!
>
> Art S. Kagel