followup: stuck lmtpd processes
Andrew Morgan
morgan at orst.edu
Wed Sep 24 15:32:14 EDT 2003
On Wed, 24 Sep 2003, Scott Adkins wrote:
> When looking at what file the processes are all waiting to get a lock on,
> it usually turns out to be the cyrus.header file and not the quota file.
> Is this still the same bug described by Rob on bugzilla? Does it have to
> be the quota file?
>
> Also, when we find the specific imaps process that happens to have the
> cyrus.header lock file opened for writing and has it locked, if we kill
> it off, we find that the write lock goes to another imaps process or to
> one of the LMTP processes and gets stuck there... we kill that one off
> and it goes to the next one and gets stuck. We never saw a case where
> all the other processes became unstuck and the problem went away.
Are you sure that the processes are hung on the cyrus.header lock? That's
what I originally thought when I was only looking at the output of lsof
and /proc/locks (linux). When I actually ran a gdb backtrace on one of
the stuck processes, it became obvious that the lock was on the quota file
instead:
(gdb) bt
#0 0x402ae5fb in fcntl () from /lib/libc.so.6
#1 0x08077504 in lock_reopen (fd=16, filename=0xbfffa098 "/var/spool/cyrus/config/quota/k/user.krolickp", sbuf=0xbfffa040,
failaction=0xbfffa03c) at lock_fcntl.c:87
#2 0x080570b6 in mailbox_lock_quota (quota=0xbfffc3c4) at mailbox.c:1016
#3 0x08053f73 in append_setup (as=0xbfffc118, name=0xbfffb114 "user.krolickp", format=0, userid=0x0, auth_state=0x0,
aclcheck=0, quotacheck=0) at append.c:209
I also saw exactly the behavior you describe when killing processes. I
originally tried killing all the lmtpd process that were stuck because I
believed that one of the lmtpd process was stuck holding the lock on
cyrus.header. When I killed the one holding the lock on cyrus.header,
another lmtpd process would grab the lock but still be stuck.
When I finally killed the process holding the quota file lock (an imaps
process), all the lmtpd processes got unstuck and delivered the waiting
mail.
It sounds to me like you're not actually killing the process that has the
lock that all the other processes are waiting for.
Andy
More information about the Info-cyrus
mailing list