followup: stuck lmtpd processes

Andrew Morgan morgan at orst.edu
Wed Sep 24 00:45:51 EDT 2003



On Tue, 23 Sep 2003, John Wade wrote:

> Hi Andrew,
>
> I was the one who wrote the message you found.   I finally came to the
> conclusion that the flat file locking mechanism is somewhat broken in
> Cyrus, but I was never a good enough C programmer to pin down what was
> happening.  (The mmap stuff makes it really tricky to debug.)    I
> wanted to blame it on the Linux kernel, but I know that others have
> experienced the same problems in Solaris.
>
> I finally gave up and wrote a locking timeout patch for 2.0.16.   see
> http://www.oakton.edu/~jwade/cyrus/ for the patch and full details
>
> A number of other folks have tried this patch successfully on 2.0.16 and
> 2.1.x, and I know it has resolved our problem.
>
> If you can solve the particular bug that causes this, more power to you,
> if not, my work around resolves a number of possible deadlock issues.
>
> Enjoy,
> John

Hey John,

Thanks for that message.  If you've read a little further in your
info-cyrus messages, you'll see that I apparently have hit upon a
different bug than the one you found (I think).  Your page was
instrumental in helping me track down the source of the problem though.

It turns out I had an imaps process that hung onto the lock on the user's
quota file.  Apparently it obtained the lock, then went off to read from
the network connection and never came back.

I think your patch would fix the problem where are lot of processes are
contending for a lock (by making them retry), but it wouldn't help if a
single process keeps the lock indefinately.  Ideally it should not be
possible for a process to get hung while it is holding the lock, but that
will require some careful programming in this particular case.  In the
meantime, I'll have to keep an eye on the system.

Thanks again for your debugging clues...

	Andy





More information about the Info-cyrus mailing list