Actually it sounds like the EMC software was trying to save you from something w...

rosser · on July 19, 2015

Did I really need to spell out the geometry of the RAID I was creating to avoid this kind of nitpicky follow-up?

It was a 20-something disk RAID 10 [1], arranged so that every mirrored pair of disks spanned different enclosures, in order to mitigate the failure of any one shelf — that is, interleaving mirrors across controllers and shelves, exactly as you suggest I should have done — and further, such that any one shelf failing only affects the mirrors that had disks on that shelf.

EMC's software wanted to allocate the drives from two shelves, with an unequal number of drives per shelf. It was just grabbing the next however many disks, linearly.

So, no, they weren't trying to balance the mirror across enclosures or controllers. They just weren't thinking.

[1] By "RAID 10", I mean "striped mirrors" — that is, create a bunch of mirrors that span shelves and then stripe across them — not "mirrored stripes" which is what you appear to be suggesting, with "it is better to have two disks on the same controller/chassis in raid 0, then raid 1".

A striped mirror is recommended in everything I've ever read on the subject, because it puts the redundancy at the lowest level of the array's geometry.

Using a mirrored stripe, on the other hand, means that when one disk fails, any other disks striped with it, still presumably perfectly functional, can't be used; the controller must instead read from and write to the mirror. If a disk in that mirror subsequently fails, you've lost data — and remember that when striping, the chance of failure is multiplied by the number of disks in the stripe.

EDIT: Footnote.

angry_octet · on July 20, 2015

Don't get all upset. You said stripe disks in your first comment, not mirror volumes.

In a big system I am usually more concerned with avoiding any SPOF (that will cause downtime), then adding redundancy for the most likely failures.

Individual mirrored pairs on split chassis will mitigate against long rebuild times for a single disk failure but it requires an all software architecture as the individual HBA can't do hot spare rebuild. It also reduces the benefit of the HBA cache. So intra chassis, striped mirrors; inter chassis, mirrored stripes. Of course, mirroring across striped mirrors would be better, but the cost to peak write and capacity might rule that out.

Also, yes I believe you, EMC sucks. But they do have some good engineers.