Load spikes when new email arrives

Thu Jan 24 16:06:50 EST 2013

On Thu, 24 Jan 2013, francis picabia wrote:

> In another email discussion on the Redhat mailing list, I've confirmed we
> have
> an issue with partition alignment.  This is getting to be quite the mess
> out there.  I saw one posting where it is speculated there are thousands of
> poorly set up disk partitions for their RAID stripe size.  fdisk and
> OS installers were late getting updated for the new TB disks
> and SSD disks as well.  Partition alignment might account
> for 5 to 30% of a performance hit.

Yeah, I read about partition alignment the last time I built a new Cyrus 
server.  I don't remember how it came to my attention, but it was wrong on 
all of my servers too.  The latest stable release of Debian Linux seems to 
do the right thing during installation, but previous versions did not.

I followed the recommendations that I found and set the starting sector to 
2048 for my partition (2048 * 512bytes = 1MB):

root at cyrus-be1:~# fdisk -lu /dev/sda

Disk /dev/sda: 536.9 GB, 536870912000 bytes
214 heads, 31 sectors/track, 158060 cylinders, total 1048576000 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x88aa51ee

    Device Boot      Start         End      Blocks   Id  System
/dev/sda1            2048  1048575999   524286976   83  Linux

I don't know how much of a performance difference it would actually make, 
but I'm trying to squeeze all I can out of it!

> I've checked and my cyrus lmtpd process count
> never exceeds 11 under work load.
> await jumps up to 150-195 at worst.
>
> If I'm already at IO saturation, I can't see how a higher lmtpd limit
> would help.

I was going to suggest setting a LOWER lmtpd limit.  :)

It sounds like you have already done that (reading the rest of this email 
thread).

> My goal is to keep the system load reasonable so it is responsive for
> mailbox access by the end users.  Right now we get nagios alerts
> about 6 times a day for excessive load.  If I can move the mail
> queue workload into a hill instead of a sharp peak on the cacti
> load graph, it would be good.  There are minutes around the peaks
> where the queue is emptied and we have only 5 messages
> inbound per minute.

Hmmm, what options are there that don't involve rebuilding the disk...

Definitely check that you have Write-Back caching enabled on the PERC.

I don't know if remounting the filesystem as ext4 would help, but that's 
worth a shot.

Are you mounting the filesystem with the "noatime" option?  There is no 
need to track atime on a Cyrus mailstore and those extra writes can add 
up.  Here are my mount options:

LABEL=be1data1  /var/spool/cyrus/mail/data1     ext4    rw,auto,data=ordered,noatime   0       2

Perhaps there are some tweaks on the Postfix side that will put less 
strain on Cyrus.  I don't know much about Postfix though.

> In hind sight, I agree RAID 10 should have been implemented.
> At the time, four years ago, getting lots of space was the
> priority as space needs always grow.  We've never seen load
> issues until this month, and it seems to coincide with a
> general increase of all email volume and traffic.  Our primary
> MX is also getting hit more than normal.

Well, if none of the easy stuff helps enough, then maybe you'll get to 
build a new Cyrus filesystem from scratch!  :)

 	Andy