Cyrus-imapd 2.4.17: processes stick on mailbox locking resulting in total mailsystem failure
John Wade
jwade at oakton.edu
Thu Dec 13 09:34:01 EST 2018
Without running gdb on the process, I have no idea, but your problem
sounds similar to something we hit a very long time ago:
See https://www.oakton.edu/user/3/jwade/cyrus/Readme.html
In our cases, the problem was the imapd process that was holding the
lock was trying to obtain a second lock on the same file. What does a
stack trace look like on the imapd process that is holding the lock?
It would appear the lock process has changed since I last looked at
this, so this may not be a help at all.
Good luck,
John
On 12/13/2018 5:21 AM, Ivan Kuznetsov wrote:
> Hello
>
> We had a company mail server under Oracle Linux 6 (a clone of RHEL6)
> with cyrus-imapd 2.3.16 working for years w/o problems. There are >9M
> messages in ~9500 mailboxes in two partitions. Daily mail traffic is
> 20-50K messages.
>
> Some weeks ago we migrated the server to a new one under Oracle Linux
> 7 with cyrus-imapd 2.4.17 (cyrus-imapd-2.4.17-13.el7.x86_64 rpm) and
> now have problem. Some times a week one of imapd processes locks an
> "INBOX" mailbox with corresponding /var/lib/imap/lock/user/<user>.lock
> file and does not unlock it anymore. Other imapd processes trying to
> access this mailbox sticks waiting to obtain the lock. The bigger
> problem is that lmtpd processes trying to deliver new mail to this
> mailbox hangs too. The number of lmtpd processes is limited
> (maxchild=32) to limit the server load, so free lmtpd pool become
> empty after a time, and mail delivery stopsto all the mailboxes. MTA
> (postfix) mail queue blows up quickly
>
> Example lsof output:
>
> lmtpd 8182 cyrus 18uR REG 249,0 0 202944402
> /var/lib/imap/lock/user/smilovsky.lock
> lmtpd 8183 cyrus 18uR REG 249,0 0 202944402
> /var/lib/imap/lock/user/smilovsky.lock
> lmtpd 8187 cyrus 18uR REG 249,0 0 202944402
> /var/lib/imap/lock/user/smilovsky.lock
> lmtpd 8771 cyrus 18uR REG 249,0 0 202944402
> /var/lib/imap/lock/user/smilovsky.lock
> lmtpd 8834 cyrus 18uR REG 249,0 0 202944402
> /var/lib/imap/lock/user/smilovsky.lock
> lmtpd 9123 cyrus 18uR REG 249,0 0 202944402
> /var/lib/imap/lock/user/smilovsky.lock
> lmtpd 9631 cyrus 18uR REG 249,0 0 202944402
> /var/lib/imap/lock/user/smilovsky.lock
> lmtpd 10343 cyrus 18uR REG 249,0 0 202944402
> /var/lib/imap/lock/user/smilovsky.lock
> lmtpd 10437 cyrus 18uR REG 249,0 0 202944402
> /var/lib/imap/lock/user/smilovsky.lock
> lmtpd 11411 cyrus 18uR REG 249,0 0 202944402
> /var/lib/imap/lock/user/smilovsky.lock
> lmtpd 11413 cyrus 18uR REG 249,0 0 202944402
> /var/lib/imap/lock/user/smilovsky.lock
> lmtpd 11506 cyrus 18uR REG 249,0 0 202944402
> /var/lib/imap/lock/user/smilovsky.lock
>
> [root at hippo4 bin]# lsof /var/lib/imap/lock/user/smilovsky.lock | grep
> '^imapd'
> imapd 9132 cyrus 19uR REG 249,0 0 202944402
> /var/lib/imap/lock/user/smilovsky.lock
> imapd 9154 cyrus 19uR REG 249,0 0 202944402
> /var/lib/imap/lock/user/smilovsky.lock
> imapd 10311 cyrus 19uR REG 249,0 0 202944402
> /var/lib/imap/lock/user/smilovsky.lock
> imapd 10746 cyrus 19uR REG 249,0 0 202944402
> /var/lib/imap/lock/user/smilovsky.lock
> imapd 11843 cyrus 19uR REG 249,0 0 202944402
> /var/lib/imap/lock/user/smilovsky.lock
> imapd 12059 cyrus 19uR REG 249,0 0 202944402
> /var/lib/imap/lock/user/smilovsky.lock
> imapd 21885 cyrus 19uR REG 249,0 0 202944402
> /var/lib/imap/lock/user/smilovsky.lock
>
> In this case the root cause imapd process is 21885. strace shows that
> the process waits on a futex forever:
>
> [root at hippo4 bin]# strace -tt -f -p 21885
> strace: Process 21885 attached
> 13:07:28.264713 futex(0x7f6dcc5ac760, FUTEX_WAIT_PRIVATE, 2,
> NULL^Cstrace: Process 21885 detached
> <detached ...>
>
> Killing the process frees the lock, all the other stuck processes runs
> and the problem disappears in a few seconds.
>
> Investigation shows that this process has a long-term connection from
> a user MUA (Thunderbird) and deletes mail messages time-to-time. This
> may be a Thunderbird 'filter' thread
>
> [root at hippo4 bin]# grep 'imap\[21885\]' /var/log/maillog
> [...]
> Dec 13 09:59:11 hippo4 imap[21885]: starttls: TLSv1.2 with cipher
> DHE-RSA-AES256-SHA (256/256 bits new) no authentication
> Dec 13 09:59:11 hippo4 imap[21885]: login: thunder.solvo.ru
> [172.16.85.69] smilovsky CRAM-MD5+TLS User logged in
> SESSIONID=<hippo.solvo.ru-21885-1544684350-1>
> Dec 13 09:59:11 hippo4 imap[21885]: client id: "name" "Thunderbird"
> "version" "52.5.2"
> Dec 13 09:59:11 hippo4 imap[21885]: Expunged 1 messages from
> user.smilovsky
> Dec 13 09:59:12 hippo4 imap[21885]: Expunged 1 messages from
> user.smilovsky
> Dec 13 09:59:12 hippo4 imap[21885]: Expunged 1 messages from
> user.smilovsky
> Dec 13 09:59:12 hippo4 imap[21885]: Expunged 1 messages from
> user.smilovsky
> Dec 13 10:01:45 hippo4 imap[21885]: Expunged 1 messages from
> user.smilovsky
> Dec 13 10:01:53 hippo4 imap[21885]: Expunged 1 messages from
> user.smilovsky
> Dec 13 10:03:17 hippo4 imap[21885]: Expunged 1 messages from
> user.smilovsky
> Dec 13 10:09:06 hippo4 imap[21885]: Expunged 1 messages from
> user.smilovsky
> Dec 13 10:09:19 hippo4 imap[21885]: Expunged 1 messages from
> user.smilovsky
>
> Problem is sporadic, I didn't see any correlation with server load,
> time of day, user name and so on
>
> Removing "maxchild=" parameter for lmtp just delay the final as the
> server system resources are limited. Now I wrote a patrol script
> counting locks and killing imapd if locks number is growing up. The
> trouble is I can't reliably determine the root cause imapd and kill
> them one-by-one
>
> Any ideas more?
>
> Ivan Kuznetsov, SOLVO ltd
> ----
> Cyrus Home Page: http://www.cyrusimap.org/
> List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
> To Unsubscribe:
> https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus
More information about the Info-cyrus
mailing list