followup: stuck lmtpd processes

Scott Adkins adkinss at ohio.edu
Wed Sep 24 10:36:14 EDT 2003


I just wanted to add something to this discussion...

First of all, we see the problem in Tru64 as well.  When we upgraded to the
2.2 series, we put in the locking patch that John described below.  This has
helped us, but the locking problem has *not* gone away... in fact, it does
a better job of *hiding* the problem than fixing it... besides, doing some
kind of backoff timeout mechansim doesn't solve the problem that some other
process has permanently placed a lock on a file, all it does is prevent lots
of LMTP processes from stacking up *because* some other process permanently
locked a file.

As a result, we still experience the problem, and it manifests itself as
being a particular user is unable to receive any email... and until that
user actually notices that this is happening, we may not notice.  The log
files (the syslog mail log particularly) does give some clues, however,
such as a lot of System I/O errors when talking to LMTP for a particular
user).

When looking at what file the processes are all waiting to get a lock on,
it usually turns out to be the cyrus.header file and not the quota file.
Is this still the same bug described by Rob on bugzilla?  Does it have to
be the quota file?

Also, when we find the specific imaps process that happens to have the
cyrus.header lock file opened for writing and has it locked, if we kill
it off, we find that the write lock goes to another imaps process or to
one of the LMTP processes and gets stuck there... we kill that one off
and it goes to the next one and gets stuck.  We never saw a case where
all the other processes became unstuck and the problem went away.

As a consquence, the only solution we have when we see the problem is to
restart the Cyrus server (we usually wait until after work hours at least).
I am not convinced the patch described below has helped us much, as when
we saw the LMTP processes stacking up, it was right in our face and we
could deal with it sooner than later.

Anyways, those are my thoughts on the subject.

Scott

--On Tuesday, September 23, 2003 10:35 PM -0500 John Wade 
<jwade at oakton.edu> wrote:

> Hi Andrew,
>
> I was the one who wrote the message you found.   I finally came to the
> conclusion that the flat file locking mechanism is somewhat broken in
> Cyrus, but I was never a good enough C programmer to pin down what was
> happening.  (The mmap stuff makes it really tricky to debug.)    I wanted
> to blame it on the Linux kernel, but I know that others have experienced
> the same problems in Solaris.
>
> I finally gave up and wrote a locking timeout patch for 2.0.16.   see
> http://www.oakton.edu/~jwade/cyrus/ for the patch and full details
>
> A number of other folks have tried this patch successfully on 2.0.16 and
> 2.1.x, and I know it has resolved our problem.
>
> If you can solve the particular bug that causes this, more power to you,
> if not, my work around resolves a number of possible deadlock issues.
>
> Enjoy,
> John
>
>
>
> Andrew Morgan wrote:
>
>> Following up on my previous post about stuck lmtpd processes.  I found
>> this incredibly detailed faq at:
>>
>> http://www.faqchest.com/prgm/cyrus-l/cyrus-01/cyrus-0111/cyrus-011102/cy
>> rus01111023_33254.html
>>
>> This isn't exactly the same problem, but the steps on that page helped me
>> figure out that they are all stuck trying to get a lock on:
>>
>> /private/cyrus/mail/k/user/krolickp/cyrus.header
>>
>> Looking at /proc/locks shows:
>>
>> 7: POSIX  ADVISORY  WRITE 21903 08:11:42107658 0 EOF d23895e0 c3217f44
>> c510e4c4 00000000 ccbf076c 7: -> POSIX  ADVISORY  WRITE 32485
>> 08:11:42107658 0 EOF ccbf0760 ee36ac44 f3bb26a4 d23895e0 ee36ac4c 7: ->
>> POSIX  ADVISORY  WRITE 1802 08:11:42107658 0 EOF ee36ac40 c050ea04
>> ccbf0764 d23895e0 c050ea0c 7: -> POSIX  ADVISORY  WRITE 1217
>> 08:11:42107658 0 EOF c050ea00 ee36a344 ee36ac44 d23895e0 ee36a34c ...
>>
>>
>> I don't see how this deadlock occurred, but I'm willing to help debug it.
>>
>>	Andy
>>
>>
>>
>>
>>
>



-- 
 +-----------------------------------------------------------------------+
      Scott W. Adkins                http://www.cns.ohiou.edu/~sadkins/
   UNIX Systems Engineer                  mailto:adkinss at ohio.edu
        ICQ 7626282                 Work (740)593-9478 Fax (740)593-1944
 +-----------------------------------------------------------------------+
     PGP Public Key available at http://www.cns.ohiou.edu/~sadkins/pgp/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 231 bytes
Desc: not available
Url : https://lists.andrew.cmu.edu/mailman/private/info-cyrus/attachments/20030924/332b2afc/attachment.bin


More information about the Info-cyrus mailing list