Miserable performance of cyrus-imapd 2.3.9 -- seems to be locking issues

Michael Bacon baconm at email.unc.edu
Thu Feb 28 17:54:56 EST 2008


Jeff,

Just as a rule of thumb, if you've got problems with Cyrus (or any mail 
system), 90% of the time they're related to I/O performance.

I've never seen drbd used for Cyrus, but it looks like other folks have 
done it.  The combination of drbd+lvm2+ext3 might put you somewhere 
unpleasant, but I'll have to let the Linux-heads jump in on that one.

Beyond that, I don't see anything obviously wrong, but maybe someone who's 
run it more on Linux can chime in.

-Michael

--On Thursday, February 28, 2008 3:36 PM -0700 Jeff Fookson 
<jfookson at as.arizona.edu> wrote:

> Michael Bacon wrote:
>
>> What database format are you using for the mailboxes database?  What
>> kind of storage is the "metapartition" (usually /var/imap) on?  What
>> kind of storage are your mail partitions on?
>
> Databases are all skiplist. Our mail partition and the metapartition are
> both on the same filesystem, as we intended that both be part of the same
> drbd mirror. That partition is
> a linux software RAID 5 (3 SATA disks). On top of the md layer is the
> drbd device; on top of that is an lvm2 logical volume; on top of that is
> an ext3 filesystem, mounted
> as '/var/imap'. The mail is then in /var/imap/mail and the metadata in
> /var/imap/config (and we also have /var/imap/certs for the ssl stuff, and
> /var/imap/sieve for sieve scripts).
>
> Thanks.
>
> Jeff Fookson
>
>>
>>
>> --On Thursday, February 28, 2008 2:38 PM -0700 Jeff Fookson
>> <jfookson at as.arizona.edu> wrote:
>>
>>> Folks-
>>>
>>> I am hoping to get some help and guidance as to why our installation of
>>> cyrus-imapd 2.3.9
>>> is unusably slow. Here are the specifics:
>>>
>>> The software is running on a 1.6GHz Opteron with 2Gb memory supporting a
>>> user base of about 400
>>> users. The average rate of arriving mail is on the order of 1-2
>>> messages/sec. The active mailstore
>>> is about 200GB.  There are typically about 200  'imapd'
>>> processes at a given time and a hugely varying number of 'lmtpds' (from
>>> about 6 to many hundreds during
>>> times of greatest pathology). System load is correspondingly in the 2-15
>>> range, but can spike to 50-70!
>>>
>>> Our users complain that the system is extremely sluggish during the day
>>> when the system is most busy.
>>>
>>> The most obvious thing we observe is that both the lmtpds and the imapds
>>> are spending HUGE times waiting
>>> on locks. Even when the system load is only 1-2, an 'strace' attached to
>>> an instance of lmtpd or imapd shows
>>> waits of  upwards of 1-2 minutes to get a write lock as shown by the
>>> example below (this is from a trace of an 'lmtpd')
>>>
>>> [strace -f -p 9817 -T]
>>> 9817  fcntl(10, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0,
>>> len=0}) = 0 <84.998159>
>>>
>>> We strongly suspect that these large times waiting on locks is what is
>>> causing the slowness our users are reporting.
>>>
>>> We are under the impression that a single instance of cyrus-imapd scales
>>> well up to about 1000 users (with about 1MB active
>>> memory per 'imapd' process),  and so we are baffled as to what might be
>>> going on.
>>>
>>> A non-standard aspect of our installation which may have something to do
>>> with the problem is that we are
>>> running cyrus on an lvm2 partition that itself is running on top of
>>> drbd. Thinking that the remote writes
>>> to the drbd secondary might be causing delays, we put the primary in
>>> stand-alone mode so that the drbd layer
>>> was not doing any network activity (the drbd link is running at gigabit
>>> speed on its own crossover cable to
>>> the secondary box) and saw no significant change in behavior. Any issues
>>> due to locking and the lvm2 layer
>>> would, of course, still be present even with drbd's activity reduced to
>>> just local writes.
>>>
>>> Can anyone suggest what we might do next to debug the problem further?
>>> Needless to say, our users get
>>> extremely unhappy when trivial operations in their mail clients take
>>> over a minute to complete.
>>>
>>> Thank you for any thoughts or advice.
>>>
>>> Jeff Fookson
>>>
>>> --
>>> Jeffrey E. Fookson, PhD            Phone: (520) 621 3091
>>> Support Systems Analyst, Principal    jfookson at as.arizona.edu
>>> Steward Observatory
>>> University of Arizona
>>>
>>> ----
>>> Cyrus Home Page: http://cyrusimap.web.cmu.edu/
>>> Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
>>> List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
>>
>>
>>
>>
>>
>
>
> --
> Jeffrey E. Fookson, PhD			Phone: (520) 621 3091
> Support Systems Analyst, Principal	jfookson at as.arizona.edu
> Steward Observatory
> University of Arizona
>






More information about the Info-cyrus mailing list