Miserable performance of cyrus-imapd 2.3.9 -- seems to be lockingissues

David Lang david.lang at digitalinsight.com
Wed Mar 5 13:19:25 EST 2008


On Wed, 5 Mar 2008, Ian G Batten wrote:

> On 05 Mar 08, at 1549, Simon Matter wrote:
>
>>> On Tue, 4 Mar 2008, Ian G Batten wrote:
>>> 
>>>> software RAID5 is a performance
>>>> disaster area at the best of times unless it can take advantage of
>>>> intimate knowledge of the intent log in the filesystem (RAID-Z does
>>>> this),
>>> 
>>> actually, unless you have top-notch hardware raid controllers, software
>>> raid 5
>>> 
>> I can only second that. I'm still wondering what "top-notch hardware raid
>> controllers" are. From my experience the only decent "controllers" you can
>> get are those in the heavy priced SAN equipments with gigs of cache on the
>> SAN controllers and tens or hundreds of spindles behind it.
>
> Sorry, that's what I was comparing it to: my experience of software RAID5 is 
> horrid (5+1 assemblages on various small Suns with disksuite) and I probably 
> live a life of luxury with hardware RAID (100-spindle Pillar with 24G of RAM, 
> assorted 50--100 spindle EMC and DotHill arrays with several GB of RAM). 
> I've rarely used PCI-slot RAID controllers: thinking back, I used PCI-card 
> controllers indirectly once upon a time --- Auspex used commodity RAID 
> controllers in their later, doomed, non-VME machines --- and they were 
> horrid.
>
> But I use software RAID 0+1 all the time, both with Solaris Disksuite (or 
> whatever it's called this week) and ZFS.  Our Cyrus meta-data partitions, for 
> example, sit in this zpool:
>
>       NAME          STATE     READ WRITE CKSUM
>       onboard       ONLINE       0     0     0
>         mirror      ONLINE       0     0     0
>           c0t0d0s4  ONLINE       0     0     0
>           c1t0d0s4  ONLINE       0     0     0
>         mirror      ONLINE       0     0     0
>           c0t1d0s4  ONLINE       0     0     0
>           c1t1d0s4  ONLINE       0     0     0
>
>
> and are perfectly happy.   The message store comes in over NAS from a 20-disk 
> stripe consisting of 4 5+1 RAID5 assemblages spread over four RAID 
> controllers fronted with ~10GB of RAM cache, however...
>
> Returning to the topic at hand, though, I can't for the life of me see why 
> anyone would want to use RAID5 in 2008 _without_ tens or hundreds of spindles 
> and gigs of cache.  Why not just use RAID 0+1?

a couple of reasons

becouse raid 0+1 can have you loose everything if you loose the wrong two disks, 
raid 6 allows you to loose any two disks and keep going.

becouse raid 5 only needs one extra drive, and raid 6 only needs 2 extra drives, 
while raid 0+1 needs 2x drives. there are physical limits to what can fit in a 
case that can make this a factor (completely ignoring power limits)

not everyone needs the fastest performance, most people are making a tradeoff 
between performance/cost/space, and as a result there are many options that are 
reasonable in different environments.

> When I've got ~40TB in my Pillar, the difference between RAID5 and RAID 0+1 
> is a large chunk of change: it's the difference between 104 500GB spindles 
> (16 5+1 volumes, 8 hot spares) and 160 500GB spindles plus however much 
> hot-sparing is prudent.  An extra sixty spindles, plus the space, 
> controllers, power supplies, cabling, metalwork is a non-trivial amount of 
> money and heat.  Likewise in the 96-spindle DotHill stack and the ~80-spindle 
> EMC: those would require dozens of extra disks and a lot of electronics.
>
> And so long as you can handle burst-writes inside the cache memory, there's 
> little read and no write performance benefit for going from RAID5 to RAID 
> 0+1: in both cases reads are serviced from a large stripe and writes go to a 
> write-back cache.  Massively sustained writes may benefit from 0+1 because it 
> is easier to do than 5, but outside video editing and ultra-high-end VTL 
> that's a rare workload.

you've just outlined good reasons to use raid 5 (or 6)

smaller budgets are sensitive to the same issues on smaller arrays.

> But at the low end?  Why piss about with something as complex, messy and 
> error-prone as RAID5 when RAID 0+1 is going to cost you a couple of extra 
> spindles and save you a RAID controller?  If you have four SATA ports on your 
> machine, just put four 500GB SATA spindles on and you have 1TB of 0+1.  Use 
> ZFS and you can turn on compression if you want, too, which is fast enough to 
> be worth the saving in spindle access relative to the CPU load for a lot of 
> workloads (I have ~10TB under ZFS compression for replication, and another 
> couple of TB for tape staging).  RAID5 is worthwhile to reduce 160 disks to 
> 100; it's not worth it to reduce 4 disks to 3.

ZFS is not available everywhere, and it is not suitable for all workloads 
(specificly database type workloads, which is a fair approximation for cyrus)

you say it's not worth reducing 4 disks to 3, but what about 6 disks to 4?
(useing your example of a machine with 4 SATA drives it's the difference between 
useing the machine you have or buying a new one)

if that's not enough, what about 8 disks to 5? (6 if you do raid 6 or want a 
hot-spare)

what is the point that you would consider the difference valid?

David Lang


More information about the Info-cyrus mailing list