Miserable performance of cyrus-imapd 2.3.9 -- seems to be lockingissues

Wed Mar 5 11:10:58 EST 2008

On 05 Mar 08, at 1549, Simon Matter wrote:

>> On Tue, 4 Mar 2008, Ian G Batten wrote:
>>
>>> software RAID5 is a performance
>>> disaster area at the best of times unless it can take advantage of
>>> intimate knowledge of the intent log in the filesystem (RAID-Z does
>>> this),
>>
>> actually, unless you have top-notch hardware raid controllers,  
>> software
>> raid 5
>>
> I can only second that. I'm still wondering what "top-notch hardware  
> raid
> controllers" are. From my experience the only decent "controllers"  
> you can
> get are those in the heavy priced SAN equipments with gigs of cache  
> on the
> SAN controllers and tens or hundreds of spindles behind it.

Sorry, that's what I was comparing it to: my experience of software  
RAID5 is horrid (5+1 assemblages on various small Suns with disksuite)  
and I probably live a life of luxury with hardware RAID (100-spindle  
Pillar with 24G of RAM, assorted 50--100 spindle EMC and DotHill  
arrays with several GB of RAM).  I've rarely used PCI-slot RAID  
controllers: thinking back, I used PCI-card controllers indirectly  
once upon a time --- Auspex used commodity RAID controllers in their  
later, doomed, non-VME machines --- and they were horrid.

But I use software RAID 0+1 all the time, both with Solaris Disksuite  
(or whatever it's called this week) and ZFS.  Our Cyrus meta-data  
partitions, for example, sit in this zpool:

         NAME          STATE     READ WRITE CKSUM
         onboard       ONLINE       0     0     0
           mirror      ONLINE       0     0     0
             c0t0d0s4  ONLINE       0     0     0
             c1t0d0s4  ONLINE       0     0     0
           mirror      ONLINE       0     0     0
             c0t1d0s4  ONLINE       0     0     0
             c1t1d0s4  ONLINE       0     0     0

and are perfectly happy.   The message store comes in over NAS from a  
20-disk stripe consisting of 4 5+1 RAID5 assemblages spread over four  
RAID controllers fronted with ~10GB of RAM cache, however...

Returning to the topic at hand, though, I can't for the life of me see  
why anyone would want to use RAID5 in 2008 _without_ tens or hundreds  
of spindles and gigs of cache.  Why not just use RAID 0+1?

When I've got ~40TB in my Pillar, the difference between RAID5 and  
RAID 0+1 is a large chunk of change: it's the difference between 104  
500GB spindles (16 5+1 volumes, 8 hot spares) and 160 500GB spindles  
plus however much hot-sparing is prudent.  An extra sixty spindles,  
plus the space, controllers, power supplies, cabling, metalwork is a  
non-trivial amount of money and heat.  Likewise in the 96-spindle  
DotHill stack and the ~80-spindle EMC: those would require dozens of  
extra disks and a lot of electronics.

And so long as you can handle burst-writes inside the cache memory,  
there's little read and no write performance benefit for going from  
RAID5 to RAID 0+1: in both cases reads are serviced from a large  
stripe and writes go to a write-back cache.  Massively sustained  
writes may benefit from 0+1 because it is easier to do than 5, but  
outside video editing and ultra-high-end VTL that's a rare workload.

But at the low end?  Why piss about with something as complex, messy  
and error-prone as RAID5 when RAID 0+1 is going to cost you a couple  
of extra spindles and save you a RAID controller?  If you have four  
SATA ports on your machine, just put four 500GB SATA spindles on and  
you have 1TB of 0+1.  Use ZFS and you can turn on compression if you  
want, too, which is fast enough to be worth the saving in spindle  
access relative to the CPU load for a lot of workloads (I have ~10TB  
under ZFS compression for replication, and another couple of TB for  
tape staging).  RAID5 is worthwhile to reduce 160 disks to 100; it's  
not worth it to reduce 4 disks to 3.

ian