Load spikes when new email arrives

Thu Jan 24 12:46:51 EST 2013

>
>> In another email discussion on the Redhat mailing list, I've confirmed
>> we have
>> an issue with partition alignment.  This is getting to be quite the mess
>> out there.  I saw one posting where it is speculated there are
>> thousands of
>> poorly set up disk partitions for their RAID stripe size.  fdisk and
>> OS installers were late getting updated for the new TB disks
>> and SSD disks as well.  Partition alignment might account
>> for 5 to 30% of a performance hit.
>>
>> I've checked and my cyrus lmtpd process count
>> never exceeds 11 under work load.
>> await jumps up to 150-195 at worst.
>>
>> If I'm already at IO saturation, I can't see how a higher lmtpd limit
>> would help.
>>
>> My goal is to keep the system load reasonable so it is responsive for
>> mailbox access by the end users.  Right now we get nagios alerts
>> about 6 times a day for excessive load.  If I can move the mail
>> queue workload into a hill instead of a sharp peak on the cacti
>> load graph, it would be good.  There are minutes around the peaks
>> where the queue is emptied and we have only 5 messages
>> inbound per minute.
>>
>> In hind sight, I agree RAID 10 should have been implemented.
>> At the time, four years ago, getting lots of space was the
>> priority as space needs always grow.  We've never seen load
>> issues until this month, and it seems to coincide with a
>> general increase of all email volume and traffic.  Our primary
>> MX is also getting hit more than normal.
>>
>>
>
> There are a couple suggestions I'd like to put forth. First, improper
> partition alignment is generally masked by the controller cache. I
> strongly encourage you to check that your RAID array is making use of
> this cache by enabling the WriteBack caching option on this array,
> especially if your PERC card has a BBU (I think this was optional on
> perc 5). You can install the MegaCLI tool from LSI to verify this (can
> also be checked from OpenManage or reboot into the controller BIOS).

I strongly suggest to do that *ONLY* with proper BBU in place!

>
> MegaCLI Link:
> http://www-947.ibm.com/support/entry/portal/docdisplay?lndocid=migr-5082327
> The relevant commands are as follows:
> MegaCli -AdpBbuCmd -aALL
> MegaCli -LDInfo -Lall -aALL
>
> Second, the PERC card does support RAID level migration, so if you want
> to add a spindle or even change RAID levels, you can. This can be done
> via either OpenManage (hit or miss) or the MegaCLI tool (daunting, but
> there are cheat sheets). You could also add a separate array to act as a
> dedicated mail spool. You can also replace the existing disks with
> faster (and/or larger) disks for additional performance without ever
> touching the software.
>
>
> To directly answer your question of "If I can move the mail queue
> workload into a hill instead of a sharp peak on the cacti load graph, it
> would be good. ", then lowering the LMTP limit in cyrus (or the upstream
> MX server) to turn the mail flow into a trickle, rather than a flood,
> would do this. You can adjust the concurrency rate of LMTP deliveries in
> postfix using lmtp_destination_concurrency_limit (default 20).  The
> cyrus method has already been mentioned. You may also look at other ways
> to reduce IO wait, such as disk defragmentation or utilizing hard links
> in cyrus (singleinstancestore: 1).

Another thing is to check partitioning here. Using separate spindles for
/var/lib/imap seems a good idea, RAID1 on two small but fast disks has
always worked fine for me.

Simon