Cyrus-imapd 2.4.17: processes stick on mailbox locking resulting in total mailsystem failure

John Wade jwade at oakton.edu
Thu Dec 13 09:34:01 EST 2018


Without running gdb on the process, I have no idea, but your problem 
sounds similar to something we hit a very long time ago:

See https://www.oakton.edu/user/3/jwade/cyrus/Readme.html

In our cases, the problem was the imapd process that was holding the 
lock was trying to obtain a second lock on the same file.   What does a 
stack trace look like on the imapd process that is holding the lock?     
It would appear the lock process has changed since I last looked at 
this, so this may not be a help at all.

Good luck,
John



On 12/13/2018 5:21 AM, Ivan Kuznetsov wrote:
> Hello
>
> We had a company mail server under Oracle Linux 6 (a clone of RHEL6) 
> with cyrus-imapd 2.3.16 working for years w/o problems. There are >9M 
> messages in ~9500 mailboxes in two partitions. Daily mail traffic is 
> 20-50K messages.
>
> Some weeks ago we migrated the server to a new one under Oracle Linux 
> 7 with cyrus-imapd 2.4.17 (cyrus-imapd-2.4.17-13.el7.x86_64 rpm) and 
> now have problem. Some times a week one of imapd processes locks an 
> "INBOX" mailbox with corresponding /var/lib/imap/lock/user/<user>.lock 
> file and does not unlock it anymore. Other imapd processes trying to 
> access this mailbox sticks waiting to obtain the lock. The bigger 
> problem is that lmtpd processes trying to deliver new mail to this 
> mailbox hangs too. The number of lmtpd processes is limited 
> (maxchild=32) to limit the server load, so free lmtpd pool become 
> empty after a time, and mail delivery stopsto all the mailboxes. MTA 
> (postfix) mail queue blows up quickly
>
> Example lsof output:
>
> lmtpd   8182 cyrus   18uR  REG              249,0         0 202944402 
> /var/lib/imap/lock/user/smilovsky.lock
> lmtpd   8183 cyrus   18uR  REG              249,0         0 202944402 
> /var/lib/imap/lock/user/smilovsky.lock
> lmtpd   8187 cyrus   18uR  REG              249,0         0 202944402 
> /var/lib/imap/lock/user/smilovsky.lock
> lmtpd   8771 cyrus   18uR  REG              249,0         0 202944402 
> /var/lib/imap/lock/user/smilovsky.lock
> lmtpd   8834 cyrus   18uR  REG              249,0         0 202944402 
> /var/lib/imap/lock/user/smilovsky.lock
> lmtpd   9123 cyrus   18uR  REG              249,0         0 202944402 
> /var/lib/imap/lock/user/smilovsky.lock
> lmtpd   9631 cyrus   18uR  REG              249,0         0 202944402 
> /var/lib/imap/lock/user/smilovsky.lock
> lmtpd   10343 cyrus   18uR  REG              249,0         0 202944402 
> /var/lib/imap/lock/user/smilovsky.lock
> lmtpd   10437 cyrus   18uR  REG              249,0         0 202944402 
> /var/lib/imap/lock/user/smilovsky.lock
> lmtpd   11411 cyrus   18uR  REG              249,0         0 202944402 
> /var/lib/imap/lock/user/smilovsky.lock
> lmtpd   11413 cyrus   18uR  REG              249,0         0 202944402 
> /var/lib/imap/lock/user/smilovsky.lock
> lmtpd   11506 cyrus   18uR  REG              249,0         0 202944402 
> /var/lib/imap/lock/user/smilovsky.lock
>
> [root at hippo4 bin]# lsof /var/lib/imap/lock/user/smilovsky.lock | grep 
> '^imapd'
> imapd    9132 cyrus   19uR  REG  249,0        0 202944402 
> /var/lib/imap/lock/user/smilovsky.lock
> imapd    9154 cyrus   19uR  REG  249,0        0 202944402 
> /var/lib/imap/lock/user/smilovsky.lock
> imapd   10311 cyrus   19uR  REG  249,0        0 202944402 
> /var/lib/imap/lock/user/smilovsky.lock
> imapd   10746 cyrus   19uR  REG  249,0        0 202944402 
> /var/lib/imap/lock/user/smilovsky.lock
> imapd   11843 cyrus   19uR  REG  249,0        0 202944402 
> /var/lib/imap/lock/user/smilovsky.lock
> imapd   12059 cyrus   19uR  REG  249,0        0 202944402 
> /var/lib/imap/lock/user/smilovsky.lock
> imapd   21885 cyrus   19uR  REG  249,0        0 202944402 
> /var/lib/imap/lock/user/smilovsky.lock
>
> In this case the root cause imapd process is 21885. strace shows that 
> the process waits on a futex forever:
>
> [root at hippo4 bin]# strace -tt -f -p 21885
> strace: Process 21885 attached
> 13:07:28.264713 futex(0x7f6dcc5ac760, FUTEX_WAIT_PRIVATE, 2, 
> NULL^Cstrace: Process 21885 detached
>  <detached ...>
>
> Killing the process frees the lock, all the other stuck processes runs 
> and the problem disappears in a few seconds.
>
> Investigation shows that this process has a long-term connection from 
> a user MUA (Thunderbird) and deletes mail messages time-to-time. This 
> may be a Thunderbird 'filter' thread
>
> [root at hippo4 bin]# grep 'imap\[21885\]' /var/log/maillog
> [...]
> Dec 13 09:59:11 hippo4 imap[21885]: starttls: TLSv1.2 with cipher 
> DHE-RSA-AES256-SHA (256/256 bits new) no authentication
> Dec 13 09:59:11 hippo4 imap[21885]: login: thunder.solvo.ru 
> [172.16.85.69] smilovsky CRAM-MD5+TLS User logged in 
> SESSIONID=<hippo.solvo.ru-21885-1544684350-1>
> Dec 13 09:59:11 hippo4 imap[21885]: client id: "name" "Thunderbird" 
> "version" "52.5.2"
> Dec 13 09:59:11 hippo4 imap[21885]: Expunged 1 messages from 
> user.smilovsky
> Dec 13 09:59:12 hippo4 imap[21885]: Expunged 1 messages from 
> user.smilovsky
> Dec 13 09:59:12 hippo4 imap[21885]: Expunged 1 messages from 
> user.smilovsky
> Dec 13 09:59:12 hippo4 imap[21885]: Expunged 1 messages from 
> user.smilovsky
> Dec 13 10:01:45 hippo4 imap[21885]: Expunged 1 messages from 
> user.smilovsky
> Dec 13 10:01:53 hippo4 imap[21885]: Expunged 1 messages from 
> user.smilovsky
> Dec 13 10:03:17 hippo4 imap[21885]: Expunged 1 messages from 
> user.smilovsky
> Dec 13 10:09:06 hippo4 imap[21885]: Expunged 1 messages from 
> user.smilovsky
> Dec 13 10:09:19 hippo4 imap[21885]: Expunged 1 messages from 
> user.smilovsky
>
> Problem is sporadic, I didn't see any correlation with server load, 
> time of day, user name and so on
>
> Removing "maxchild=" parameter for lmtp just delay the final as the 
> server system resources are limited. Now I wrote a patrol script 
> counting locks and killing imapd if locks number is growing up. The 
> trouble is I can't reliably determine the root cause imapd and kill 
> them one-by-one
>
> Any ideas more?
>
> Ivan Kuznetsov, SOLVO ltd
> ----
> Cyrus Home Page: http://www.cyrusimap.org/
> List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
> To Unsubscribe:
> https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus



More information about the Info-cyrus mailing list