followup: stuck lmtpd processes

Etienne Goyer etienne.goyer at linuxquebec.com
Wed Sep 24 09:22:29 EDT 2003


Hi,

I don't have time this morning to have a look at your patch and
understand the issue, but it reminded me of another bug I found a few
months ago.  It may or may not relate to the problem you are fixing.  I
just think you might be interested in knowing.  It's the timeout part of
your problem that chimed me.

In Linux, a system call interrupted by alarm() does not exit; the
SIG_ALRM handler is executed, then the syscall is retried.  If the
program you are trying to troubleshoot depend on alarm() to interrupt a
blocking system call, it will deadlock in Linux (at least, it does in 
RedHat 7.3).

I stumble on this bug in the fud server and submitted a patch, but it was
not considered AFAIK since it was not reproducible in Solaris (duh!) and
Linux was vaguely blamed while nothing was being done to address the
issue.

Sorry if this is unrelated to your problem.  I thought you may be
interested in my experience in case it is.

On Tue, Sep 23, 2003 at 10:35:04PM -0500, John Wade wrote:
> Hi Andrew,
> 
> I was the one who wrote the message you found.   I finally came to the 
> conclusion that the flat file locking mechanism is somewhat broken in 
> Cyrus, but I was never a good enough C programmer to pin down what was 
> happening.  (The mmap stuff makes it really tricky to debug.)    I 
> wanted to blame it on the Linux kernel, but I know that others have 
> experienced the same problems in Solaris.
> 
> I finally gave up and wrote a locking timeout patch for 2.0.16.   see 
> http://www.oakton.edu/~jwade/cyrus/ for the patch and full details
> 
> A number of other folks have tried this patch successfully on 2.0.16 and 
> 2.1.x, and I know it has resolved our problem.
> 
> If you can solve the particular bug that causes this, more power to you, 
> if not, my work around resolves a number of possible deadlock issues.
> 
> Enjoy,
> John
> 
> 
> 
> Andrew Morgan wrote:
> 
> >Following up on my previous post about stuck lmtpd processes.  I found
> >this incredibly detailed faq at:
> >
> >http://www.faqchest.com/prgm/cyrus-l/cyrus-01/cyrus-0111/cyrus-011102/cyrus01111023_33254.html
> >
> >This isn't exactly the same problem, but the steps on that page helped me
> >figure out that they are all stuck trying to get a lock on:
> >
> >/private/cyrus/mail/k/user/krolickp/cyrus.header
> >
> >Looking at /proc/locks shows:
> >
> >7: POSIX  ADVISORY  WRITE 21903 08:11:42107658 0 EOF d23895e0 c3217f44 c510e4c4 00000000 ccbf076c
> >7: -> POSIX  ADVISORY  WRITE 32485 08:11:42107658 0 EOF ccbf0760 ee36ac44 f3bb26a4 d23895e0 ee36ac4c
> >7: -> POSIX  ADVISORY  WRITE 1802 08:11:42107658 0 EOF ee36ac40 c050ea04 ccbf0764 d23895e0 c050ea0c
> >7: -> POSIX  ADVISORY  WRITE 1217 08:11:42107658 0 EOF c050ea00 ee36a344 ee36ac44 d23895e0 ee36a34c
> >...
> >
> >
> >I don't see how this deadlock occurred, but I'm willing to help debug it.
> >
> >	Andy
> >
> >
> >
> >  
> >

-- 
Etienne Goyer                    Linux Québec Technologies Inc.
http://www.LinuxQuebec.com       etienne.goyer at linuxquebec.com




More information about the Info-cyrus mailing list