Miserable performance of cyrus-imapd 2.3.9 -- seems to be locking issues

Michael Bacon baconm at email.unc.edu
Thu Feb 28 16:52:26 EST 2008


What database format are you using for the mailboxes database?  What kind 
of storage is the "metapartition" (usually /var/imap) on?  What kind of 
storage are your mail partitions on?


--On Thursday, February 28, 2008 2:38 PM -0700 Jeff Fookson 
<jfookson at as.arizona.edu> wrote:

> Folks-
>
> I am hoping to get some help and guidance as to why our installation of
> cyrus-imapd 2.3.9
> is unusably slow. Here are the specifics:
>
> The software is running on a 1.6GHz Opteron with 2Gb memory supporting a
> user base of about 400
> users. The average rate of arriving mail is on the order of 1-2
> messages/sec. The active mailstore
> is about 200GB.  There are typically about 200  'imapd'
> processes at a given time and a hugely varying number of 'lmtpds' (from
> about 6 to many hundreds during
> times of greatest pathology). System load is correspondingly in the 2-15
> range, but can spike to 50-70!
>
> Our users complain that the system is extremely sluggish during the day
> when the system is most busy.
>
> The most obvious thing we observe is that both the lmtpds and the imapds
> are spending HUGE times waiting
> on locks. Even when the system load is only 1-2, an 'strace' attached to
> an instance of lmtpd or imapd shows
> waits of  upwards of 1-2 minutes to get a write lock as shown by the
> example below (this is from a trace of an 'lmtpd')
>
> [strace -f -p 9817 -T]
> 9817  fcntl(10, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0,
> len=0}) = 0 <84.998159>
>
> We strongly suspect that these large times waiting on locks is what is
> causing the slowness our users are reporting.
>
> We are under the impression that a single instance of cyrus-imapd scales
> well up to about 1000 users (with about 1MB active
> memory per 'imapd' process),  and so we are baffled as to what might be
> going on.
>
> A non-standard aspect of our installation which may have something to do
> with the problem is that we are
> running cyrus on an lvm2 partition that itself is running on top of
> drbd. Thinking that the remote writes
> to the drbd secondary might be causing delays, we put the primary in
> stand-alone mode so that the drbd layer
> was not doing any network activity (the drbd link is running at gigabit
> speed on its own crossover cable to
> the secondary box) and saw no significant change in behavior. Any issues
> due to locking and the lvm2 layer
> would, of course, still be present even with drbd's activity reduced to
> just local writes.
>
> Can anyone suggest what we might do next to debug the problem further?
> Needless to say, our users get
> extremely unhappy when trivial operations in their mail clients take
> over a minute to complete.
>
> Thank you for any thoughts or advice.
>
> Jeff Fookson
>
> --
> Jeffrey E. Fookson, PhD			Phone: (520) 621 3091
> Support Systems Analyst, Principal	jfookson at as.arizona.edu
> Steward Observatory
> University of Arizona
>
> ----
> Cyrus Home Page: http://cyrusimap.web.cmu.edu/
> Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
> List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html






More information about the Info-cyrus mailing list