Cyrus-imapd 2.4.17: processes stick on mailbox locking resulting in total mailsystem failure

Ivan Kuznetsov kia at solvo.ru
Thu Dec 13 09:50:02 EST 2018


Jonk, thank you for the idea. Somewhat looks strange as old mail server 
worked w/o this problem 5+ years. But the system environment changed 
dramatically, may be some filesystem quircks are significant for this 
locks...

I will try gdb'ing the process when problem occurs once more

13.12.2018 17:34, John Wade пишет:
> Without running gdb on the process, I have no idea, but your problem 
> sounds similar to something we hit a very long time ago:
> 
> See https://www.oakton.edu/user/3/jwade/cyrus/Readme.html
> 
> In our cases, the problem was the imapd process that was holding the 
> lock was trying to obtain a second lock on the same file.   What does a 
> stack trace look like on the imapd process that is holding the lock? It 
> would appear the lock process has changed since I last looked at this, 
> so this may not be a help at all.
> 
> Good luck,
> John
> 
> 
> 
> On 12/13/2018 5:21 AM, Ivan Kuznetsov wrote:
>> Hello
>>
>> We had a company mail server under Oracle Linux 6 (a clone of RHEL6) 
>> with cyrus-imapd 2.3.16 working for years w/o problems. There are >9M 
>> messages in ~9500 mailboxes in two partitions. Daily mail traffic is 
>> 20-50K messages.
>>
>> Some weeks ago we migrated the server to a new one under Oracle Linux 
>> 7 with cyrus-imapd 2.4.17 (cyrus-imapd-2.4.17-13.el7.x86_64 rpm) and 
>> now have problem. Some times a week one of imapd processes locks an 
>> "INBOX" mailbox with corresponding /var/lib/imap/lock/user/<user>.lock 
>> file and does not unlock it anymore. Other imapd processes trying to 
>> access this mailbox sticks waiting to obtain the lock. The bigger 
>> problem is that lmtpd processes trying to deliver new mail to this 
>> mailbox hangs too. The number of lmtpd processes is limited 
>> (maxchild=32) to limit the server load, so free lmtpd pool become 
>> empty after a time, and mail delivery stopsto all the mailboxes. MTA 
>> (postfix) mail queue blows up quickly
>>
>> Example lsof output:
>>
>> lmtpd   8182 cyrus   18uR  REG              249,0         0 202944402 
>> /var/lib/imap/lock/user/smilovsky.lock
>> lmtpd   8183 cyrus   18uR  REG              249,0         0 202944402 
>> /var/lib/imap/lock/user/smilovsky.lock
>> lmtpd   8187 cyrus   18uR  REG              249,0         0 202944402 
>> /var/lib/imap/lock/user/smilovsky.lock
>> lmtpd   8771 cyrus   18uR  REG              249,0         0 202944402 
>> /var/lib/imap/lock/user/smilovsky.lock
>> lmtpd   8834 cyrus   18uR  REG              249,0         0 202944402 
>> /var/lib/imap/lock/user/smilovsky.lock
>> lmtpd   9123 cyrus   18uR  REG              249,0         0 202944402 
>> /var/lib/imap/lock/user/smilovsky.lock
>> lmtpd   9631 cyrus   18uR  REG              249,0         0 202944402 
>> /var/lib/imap/lock/user/smilovsky.lock
>> lmtpd   10343 cyrus   18uR  REG              249,0         0 202944402 
>> /var/lib/imap/lock/user/smilovsky.lock
>> lmtpd   10437 cyrus   18uR  REG              249,0         0 202944402 
>> /var/lib/imap/lock/user/smilovsky.lock
>> lmtpd   11411 cyrus   18uR  REG              249,0         0 202944402 
>> /var/lib/imap/lock/user/smilovsky.lock
>> lmtpd   11413 cyrus   18uR  REG              249,0         0 202944402 
>> /var/lib/imap/lock/user/smilovsky.lock
>> lmtpd   11506 cyrus   18uR  REG              249,0         0 202944402 
>> /var/lib/imap/lock/user/smilovsky.lock
>>
>> [root at hippo4 bin]# lsof /var/lib/imap/lock/user/smilovsky.lock | grep 
>> '^imapd'
>> imapd    9132 cyrus   19uR  REG  249,0        0 202944402 
>> /var/lib/imap/lock/user/smilovsky.lock
>> imapd    9154 cyrus   19uR  REG  249,0        0 202944402 
>> /var/lib/imap/lock/user/smilovsky.lock
>> imapd   10311 cyrus   19uR  REG  249,0        0 202944402 
>> /var/lib/imap/lock/user/smilovsky.lock
>> imapd   10746 cyrus   19uR  REG  249,0        0 202944402 
>> /var/lib/imap/lock/user/smilovsky.lock
>> imapd   11843 cyrus   19uR  REG  249,0        0 202944402 
>> /var/lib/imap/lock/user/smilovsky.lock
>> imapd   12059 cyrus   19uR  REG  249,0        0 202944402 
>> /var/lib/imap/lock/user/smilovsky.lock
>> imapd   21885 cyrus   19uR  REG  249,0        0 202944402 
>> /var/lib/imap/lock/user/smilovsky.lock
>>
>> In this case the root cause imapd process is 21885. strace shows that 
>> the process waits on a futex forever:
>>
>> [root at hippo4 bin]# strace -tt -f -p 21885
>> strace: Process 21885 attached
>> 13:07:28.264713 futex(0x7f6dcc5ac760, FUTEX_WAIT_PRIVATE, 2, 
>> NULL^Cstrace: Process 21885 detached
>>  <detached ...>
>>
>> Killing the process frees the lock, all the other stuck processes runs 
>> and the problem disappears in a few seconds.
>>
>> Investigation shows that this process has a long-term connection from 
>> a user MUA (Thunderbird) and deletes mail messages time-to-time. This 
>> may be a Thunderbird 'filter' thread
>>
>> [root at hippo4 bin]# grep 'imap\[21885\]' /var/log/maillog
>> [...]
>> Dec 13 09:59:11 hippo4 imap[21885]: starttls: TLSv1.2 with cipher 
>> DHE-RSA-AES256-SHA (256/256 bits new) no authentication
>> Dec 13 09:59:11 hippo4 imap[21885]: login: thunder.solvo.ru 
>> [172.16.85.69] smilovsky CRAM-MD5+TLS User logged in 
>> SESSIONID=<hippo.solvo.ru-21885-1544684350-1>
>> Dec 13 09:59:11 hippo4 imap[21885]: client id: "name" "Thunderbird" 
>> "version" "52.5.2"
>> Dec 13 09:59:11 hippo4 imap[21885]: Expunged 1 messages from 
>> user.smilovsky
>> Dec 13 09:59:12 hippo4 imap[21885]: Expunged 1 messages from 
>> user.smilovsky
>> Dec 13 09:59:12 hippo4 imap[21885]: Expunged 1 messages from 
>> user.smilovsky
>> Dec 13 09:59:12 hippo4 imap[21885]: Expunged 1 messages from 
>> user.smilovsky
>> Dec 13 10:01:45 hippo4 imap[21885]: Expunged 1 messages from 
>> user.smilovsky
>> Dec 13 10:01:53 hippo4 imap[21885]: Expunged 1 messages from 
>> user.smilovsky
>> Dec 13 10:03:17 hippo4 imap[21885]: Expunged 1 messages from 
>> user.smilovsky
>> Dec 13 10:09:06 hippo4 imap[21885]: Expunged 1 messages from 
>> user.smilovsky
>> Dec 13 10:09:19 hippo4 imap[21885]: Expunged 1 messages from 
>> user.smilovsky
>>
>> Problem is sporadic, I didn't see any correlation with server load, 
>> time of day, user name and so on
>>
>> Removing "maxchild=" parameter for lmtp just delay the final as the 
>> server system resources are limited. Now I wrote a patrol script 
>> counting locks and killing imapd if locks number is growing up. The 
>> trouble is I can't reliably determine the root cause imapd and kill 
>> them one-by-one
>>
>> Any ideas more?
>>
>> Ivan Kuznetsov, SOLVO ltd
>> ----
>> Cyrus Home Page: http://www.cyrusimap.org/
>> List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
>> To Unsubscribe:
>> https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus
> 

-- 
С уважением, Иван Кузнецов
Руководитель технического отдела

Компания "СОЛВО"
+7(812)60-60-555
+7(495)66-83-003
+7(921)740-72-61
http://www.solvo.ru


More information about the Info-cyrus mailing list