Miserable performance of cyrus-imapd 2.3.9 -- seems to be locking issues

Simon Matter simon.matter at invoca.ch
Thu Feb 28 18:25:16 EST 2008


> Michael Bacon wrote:
>
>> What database format are you using for the mailboxes database?  What
>> kind of storage is the "metapartition" (usually /var/imap) on?  What
>> kind of storage are your mail partitions on?
>
> Databases are all skiplist. Our mail partition and the metapartition are

skiplist is good.

> both on the same filesystem, as we intended that both be part of the
> same drbd mirror. That partition is
> a linux software RAID 5 (3 SATA disks). On top of the md layer is the

software RAID 5 seems fine for data but I stronly suggest separate RAID 1
for config.

> drbd device; on top of that is an lvm2 logical volume; on top of that is

I don't think LVM2 is the problem here, I'm using it almost everywhere.
The same with ext3.

I have never used drbd in production but, could it be that it's causing
you the problems? I've done some intensive benchmarks with different
solutions like AOE and gnbd and found that it performs quite bad for
certain types of usage.
Couldn't you test by simply mounting the LVM device without the drbd layer
(maybe with an offset where the real filesystem begins)?

What I know for sure is that your server should do very fine with that
count of connections.

Simon

> an ext3 filesystem, mounted
> as '/var/imap'. The mail is then in /var/imap/mail and the metadata in
> /var/imap/config (and we also have /var/imap/certs for the ssl stuff,
> and /var/imap/sieve for sieve scripts).
>
> Thanks.
>
> Jeff Fookson
>
>>
>>
>> --On Thursday, February 28, 2008 2:38 PM -0700 Jeff Fookson
>> <jfookson at as.arizona.edu> wrote:
>>
>>> Folks-
>>>
>>> I am hoping to get some help and guidance as to why our installation of
>>> cyrus-imapd 2.3.9
>>> is unusably slow. Here are the specifics:
>>>
>>> The software is running on a 1.6GHz Opteron with 2Gb memory supporting
>>> a
>>> user base of about 400
>>> users. The average rate of arriving mail is on the order of 1-2
>>> messages/sec. The active mailstore
>>> is about 200GB.  There are typically about 200  'imapd'
>>> processes at a given time and a hugely varying number of 'lmtpds' (from
>>> about 6 to many hundreds during
>>> times of greatest pathology). System load is correspondingly in the
>>> 2-15
>>> range, but can spike to 50-70!
>>>
>>> Our users complain that the system is extremely sluggish during the day
>>> when the system is most busy.
>>>
>>> The most obvious thing we observe is that both the lmtpds and the
>>> imapds
>>> are spending HUGE times waiting
>>> on locks. Even when the system load is only 1-2, an 'strace' attached
>>> to
>>> an instance of lmtpd or imapd shows
>>> waits of  upwards of 1-2 minutes to get a write lock as shown by the
>>> example below (this is from a trace of an 'lmtpd')
>>>
>>> [strace -f -p 9817 -T]
>>> 9817  fcntl(10, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0,
>>> len=0}) = 0 <84.998159>
>>>
>>> We strongly suspect that these large times waiting on locks is what is
>>> causing the slowness our users are reporting.
>>>
>>> We are under the impression that a single instance of cyrus-imapd
>>> scales
>>> well up to about 1000 users (with about 1MB active
>>> memory per 'imapd' process),  and so we are baffled as to what might be
>>> going on.
>>>
>>> A non-standard aspect of our installation which may have something to
>>> do
>>> with the problem is that we are
>>> running cyrus on an lvm2 partition that itself is running on top of
>>> drbd. Thinking that the remote writes
>>> to the drbd secondary might be causing delays, we put the primary in
>>> stand-alone mode so that the drbd layer
>>> was not doing any network activity (the drbd link is running at gigabit
>>> speed on its own crossover cable to
>>> the secondary box) and saw no significant change in behavior. Any
>>> issues
>>> due to locking and the lvm2 layer
>>> would, of course, still be present even with drbd's activity reduced to
>>> just local writes.
>>>
>>> Can anyone suggest what we might do next to debug the problem further?
>>> Needless to say, our users get
>>> extremely unhappy when trivial operations in their mail clients take
>>> over a minute to complete.
>>>
>>> Thank you for any thoughts or advice.
>>>
>>> Jeff Fookson
>>>
>>> --
>>> Jeffrey E. Fookson, PhD            Phone: (520) 621 3091
>>> Support Systems Analyst, Principal    jfookson at as.arizona.edu
>>> Steward Observatory
>>> University of Arizona
>>>
>>> ----
>>> Cyrus Home Page: http://cyrusimap.web.cmu.edu/
>>> Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
>>> List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
>>
>>
>>
>>
>>
>
>
> --
> Jeffrey E. Fookson, PhD			Phone: (520) 621 3091
> Support Systems Analyst, Principal	jfookson at as.arizona.edu
> Steward Observatory
> University of Arizona
>
> ----
> Cyrus Home Page: http://cyrusimap.web.cmu.edu/
> Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
> List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
>




More information about the Info-cyrus mailing list