Miserable performance of cyrus-imapd 2.3.9 -- seems to be locking issues

Allen Chen achen at harbourfrontcentre.com
Fri Feb 29 16:52:47 EST 2008


I just got out of this kind of situation.
If your OS is Linux, can you post /etc/syslog.conf?

Allen

Jeff Fookson wrote:
> Folks-
>
> I am hoping to get some help and guidance as to why our installation of 
> cyrus-imapd 2.3.9
> is unusably slow. Here are the specifics:
>
> The software is running on a 1.6GHz Opteron with 2Gb memory supporting a 
> user base of about 400
> users. The average rate of arriving mail is on the order of 1-2 
> messages/sec. The active mailstore
> is about 200GB.  There are typically about 200  'imapd'
> processes at a given time and a hugely varying number of 'lmtpds' (from 
> about 6 to many hundreds during
> times of greatest pathology). System load is correspondingly in the 2-15 
> range, but can spike to 50-70!
>
> Our users complain that the system is extremely sluggish during the day 
> when the system is most busy.
>
> The most obvious thing we observe is that both the lmtpds and the imapds 
> are spending HUGE times waiting
> on locks. Even when the system load is only 1-2, an 'strace' attached to 
> an instance of lmtpd or imapd shows
> waits of  upwards of 1-2 minutes to get a write lock as shown by the 
> example below (this is from a trace of an 'lmtpd')
>
> [strace -f -p 9817 -T]
> 9817  fcntl(10, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0, 
> len=0}) = 0 <84.998159>
>
> We strongly suspect that these large times waiting on locks is what is 
> causing the slowness our users are reporting.
>
> We are under the impression that a single instance of cyrus-imapd scales 
> well up to about 1000 users (with about 1MB active
> memory per 'imapd' process),  and so we are baffled as to what might be 
> going on.
>
> A non-standard aspect of our installation which may have something to do 
> with the problem is that we are
> running cyrus on an lvm2 partition that itself is running on top of 
> drbd. Thinking that the remote writes
> to the drbd secondary might be causing delays, we put the primary in 
> stand-alone mode so that the drbd layer
> was not doing any network activity (the drbd link is running at gigabit 
> speed on its own crossover cable to
> the secondary box) and saw no significant change in behavior. Any issues 
> due to locking and the lvm2 layer
> would, of course, still be present even with drbd's activity reduced to 
> just local writes.
>
> Can anyone suggest what we might do next to debug the problem further? 
> Needless to say, our users get
> extremely unhappy when trivial operations in their mail clients take 
> over a minute to complete.
>
> Thank you for any thoughts or advice.
>
> Jeff Fookson
>
>   



More information about the Info-cyrus mailing list