Miserable performance of cyrus-imapd 2.3.9 -- seems to be lockingissues
Ian G Batten
ian.batten at uk.fujitsu.com
Wed Mar 5 11:10:58 EST 2008
On 05 Mar 08, at 1549, Simon Matter wrote:
>> On Tue, 4 Mar 2008, Ian G Batten wrote:
>>
>>> software RAID5 is a performance
>>> disaster area at the best of times unless it can take advantage of
>>> intimate knowledge of the intent log in the filesystem (RAID-Z does
>>> this),
>>
>> actually, unless you have top-notch hardware raid controllers,
>> software
>> raid 5
>>
> I can only second that. I'm still wondering what "top-notch hardware
> raid
> controllers" are. From my experience the only decent "controllers"
> you can
> get are those in the heavy priced SAN equipments with gigs of cache
> on the
> SAN controllers and tens or hundreds of spindles behind it.
Sorry, that's what I was comparing it to: my experience of software
RAID5 is horrid (5+1 assemblages on various small Suns with disksuite)
and I probably live a life of luxury with hardware RAID (100-spindle
Pillar with 24G of RAM, assorted 50--100 spindle EMC and DotHill
arrays with several GB of RAM). I've rarely used PCI-slot RAID
controllers: thinking back, I used PCI-card controllers indirectly
once upon a time --- Auspex used commodity RAID controllers in their
later, doomed, non-VME machines --- and they were horrid.
But I use software RAID 0+1 all the time, both with Solaris Disksuite
(or whatever it's called this week) and ZFS. Our Cyrus meta-data
partitions, for example, sit in this zpool:
NAME STATE READ WRITE CKSUM
onboard ONLINE 0 0 0
mirror ONLINE 0 0 0
c0t0d0s4 ONLINE 0 0 0
c1t0d0s4 ONLINE 0 0 0
mirror ONLINE 0 0 0
c0t1d0s4 ONLINE 0 0 0
c1t1d0s4 ONLINE 0 0 0
and are perfectly happy. The message store comes in over NAS from a
20-disk stripe consisting of 4 5+1 RAID5 assemblages spread over four
RAID controllers fronted with ~10GB of RAM cache, however...
Returning to the topic at hand, though, I can't for the life of me see
why anyone would want to use RAID5 in 2008 _without_ tens or hundreds
of spindles and gigs of cache. Why not just use RAID 0+1?
When I've got ~40TB in my Pillar, the difference between RAID5 and
RAID 0+1 is a large chunk of change: it's the difference between 104
500GB spindles (16 5+1 volumes, 8 hot spares) and 160 500GB spindles
plus however much hot-sparing is prudent. An extra sixty spindles,
plus the space, controllers, power supplies, cabling, metalwork is a
non-trivial amount of money and heat. Likewise in the 96-spindle
DotHill stack and the ~80-spindle EMC: those would require dozens of
extra disks and a lot of electronics.
And so long as you can handle burst-writes inside the cache memory,
there's little read and no write performance benefit for going from
RAID5 to RAID 0+1: in both cases reads are serviced from a large
stripe and writes go to a write-back cache. Massively sustained
writes may benefit from 0+1 because it is easier to do than 5, but
outside video editing and ultra-high-end VTL that's a rare workload.
But at the low end? Why piss about with something as complex, messy
and error-prone as RAID5 when RAID 0+1 is going to cost you a couple
of extra spindles and save you a RAID controller? If you have four
SATA ports on your machine, just put four 500GB SATA spindles on and
you have 1TB of 0+1. Use ZFS and you can turn on compression if you
want, too, which is fast enough to be worth the saving in spindle
access relative to the CPU load for a lot of workloads (I have ~10TB
under ZFS compression for replication, and another couple of TB for
tape staging). RAID5 is worthwhile to reduce 160 disks to 100; it's
not worth it to reduce 4 disks to 3.
ian
More information about the Info-cyrus
mailing list