IMAP/LMTP/Quota locking problem
Sebastian Hagedorn
Hagedorn at uni-koeln.de
Mon May 17 08:47:33 EDT 2004
Hi,
sorry, this is quite long ... I'm looking for the updated status of the
problem discussed in the threads "stuck lmtpd processes":
<http://asg.web.cmu.edu/archive/message.php?mailbox=archive.info-cyrus&msg=24927>
and "followup: stuck lmtpd processes":
<http://asg.web.cmu.edu/archive/message.php?mailbox=archive.info-cyrus&searchterm=stuck%20lmtpd&msg=24934>
AFAICT Cyrus IMAP 2.1.x hasn't been updated since. Could anyone comment on
whether this is fixed in 2.2.3 and if we *have* to update in order to get
that fix? In the last two weeks we've had this (or a very similar) problem
twice with 2.1.16 und Red Hat AS 2.1.
For some reason in both cases tons of sendmail processes were building up.
We have configured sendmail to deliver in "interactive" mode, so each
instance tries to connect to lmtpd. I would've expected those attempts to
time out relatively quickly. I'm not sure why they don't. Anyway, that's
somewhat off-topic ...
The first time it happened I was around and collected some info while it
was happening. I'd noticed that mail was no longer being delivered. We have
maxprocs set to 30 for lmtpd:
[root at lvr1 root]# ps -aef|grep lmtp
cyrus 1383 2095 0 10:10 ? 00:00:18 lmtpd
cyrus 1974 2095 0 10:11 ? 00:00:00 lmtpd
cyrus 10631 2095 0 10:36 ? 00:00:00 lmtpd
cyrus 17162 2095 0 10:52 ? 00:00:00 lmtpd
cyrus 17262 2095 0 10:52 ? 00:00:06 lmtpd
cyrus 25687 2095 0 11:17 ? 00:00:00 lmtpd
cyrus 28377 2095 0 11:23 ? 00:00:09 lmtpd
cyrus 31251 2095 0 11:32 ? 00:00:00 lmtpd
cyrus 3824 2095 0 11:46 ? 00:00:07 lmtpd
cyrus 7534 2095 0 11:56 ? 00:00:00 lmtpd
cyrus 8248 2095 0 11:58 ? 00:00:08 lmtpd
cyrus 22103 2095 0 12:35 ? 00:00:00 lmtpd
cyrus 23725 2095 0 12:39 ? 00:00:01 lmtpd
cyrus 25366 2095 0 12:45 ? 00:00:00 lmtpd
cyrus 6108 2095 0 13:23 ? 00:00:00 lmtpd
cyrus 10832 2095 0 13:37 ? 00:00:00 lmtpd
cyrus 12972 2095 0 13:42 ? 00:00:00 lmtpd
cyrus 16114 2095 0 13:53 ? 00:00:00 lmtpd
cyrus 21377 2095 0 14:07 ? 00:00:09 lmtpd
cyrus 24581 2095 0 14:16 ? 00:00:00 lmtpd
cyrus 26200 2095 0 14:21 ? 00:00:14 lmtpd
cyrus 26577 2095 0 14:22 ? 00:00:00 lmtpd
cyrus 31893 2095 0 14:35 ? 00:00:03 lmtpd
cyrus 32575 2095 0 14:36 ? 00:00:07 lmtpd
cyrus 32659 2095 0 14:37 ? 00:00:01 lmtpd
cyrus 14146 2095 0 15:16 ? 00:00:00 lmtpd
cyrus 21049 2095 0 15:36 ? 00:00:00 lmtpd
cyrus 26969 2095 0 15:56 ? 00:00:11 lmtpd
cyrus 13275 2095 0 15:15 ? 00:00:04 lmtpd
cyrus 17679 2095 0 13:57 ? 00:00:00 lmtpd
Obviously something was wrong.
[root at lvr1 root]# lsof -p 26969
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
lmtpd 26969 cyrus cwd DIR 9,0 4096 393296 /root
<snip>
lmtpd 26969 cyrus mem REG 8,73 3841576 30
/var/lib/imap/mailboxes.db
lmtpd 26969 cyrus mem REG 8,49 194 145326099
/var/spool/imap/M/user/therbst/cyrus.header
<snip>
lmtpd 26969 cyrus 9u REG 8,49 3130 56013510
/var/spool/imap/stage./26969-1083938870
lmtpd 26969 cyrus 10u unix 0xef1ce040 614472274 socket
lmtpd 26969 cyrus 11u REG 8,49 194 145326099
/var/spool/imap/M/user/therbst/cyrus.header
I gathered that /var/spool/imap/M/user/therbst/cyrus.header was causing the
problems.Its node number (is that the inode??) is 145326099:
[root at lvr1 root]# grep 145326099 /proc/locks
1162: POSIX ADVISORY WRITE 1974 08:31:145326099 0 EOF e9b58d54 e9b587f4
cd5a7964 00000000 d1c324c0
1162: -> POSIX ADVISORY WRITE 1383 08:31:145326099 0 EOF d1c324b4
c7fd745c f3bd0964 e9b58d54 c7fd7464
1162: -> POSIX ADVISORY WRITE 10631 08:31:145326099 0 EOF c7fd7458
d1c3245c d1c324b8 e9b58d54 d1c32464
1162: -> POSIX ADVISORY WRITE 17162 08:31:145326099 0 EOF d1c32458
d6989400 c7fd745c e9b58d54 d6989408
1162: -> POSIX ADVISORY WRITE 17262 08:31:145326099 0 EOF d69893fc
dfb2eca0 d1c3245c e9b58d54 dfb2eca8
1162: -> POSIX ADVISORY WRITE 25687 08:31:145326099 0 EOF dfb2ec9c
f3972be8 d6989400 e9b58d54 f3972bf0
1162: -> POSIX ADVISORY WRITE 28377 08:31:145326099 0 EOF f3972be4
c47d96e0 dfb2eca0 e9b58d54 c47d96e8
1162: -> POSIX ADVISORY WRITE 31251 08:31:145326099 0 EOF c47d96dc
cd5a7d58 f3972be8 e9b58d54 cd5a7d60
1162: -> POSIX ADVISORY WRITE 7534 08:31:145326099 0 EOF cd5a7d54
c47d98ac c47d96e0 e9b58d54 c47d98b4
1162: -> POSIX ADVISORY WRITE 3824 08:31:145326099 0 EOF c47d98a8
c7fd7400 cd5a7d58 e9b58d54 c7fd7408
1162: -> POSIX ADVISORY WRITE 8248 08:31:145326099 0 EOF c7fd73fc
c8733be8 c47d98ac e9b58d54 c8733bf0
1162: -> POSIX ADVISORY WRITE 22103 08:31:145326099 0 EOF c8733be4
cd197f24 c7fd7400 e9b58d54 cd197f2c
1162: -> POSIX ADVISORY WRITE 25366 08:31:145326099 0 EOF cd197f20
d1c32290 c8733be8 e9b58d54 d1c32298
1162: -> POSIX ADVISORY WRITE 23725 08:31:145326099 0 EOF d1c3228c
c47d9964 cd197f24 e9b58d54 c47d996c
1162: -> POSIX ADVISORY WRITE 6108 08:31:145326099 0 EOF c47d9960
f3bd0234 d1c32290 e9b58d54 f3bd023c
1162: -> POSIX ADVISORY WRITE 10832 08:31:145326099 0 EOF f3bd0230
c47d9290 c47d9964 e9b58d54 c47d9298
1162: -> POSIX ADVISORY WRITE 12972 08:31:145326099 0 EOF c47d928c
d1b450c4 f3bd0234 e9b58d54 d1b450cc
1162: -> POSIX ADVISORY WRITE 16114 08:31:145326099 0 EOF d1b450c0
dfb2e120 c47d9290 e9b58d54 dfb2e128
1162: -> POSIX ADVISORY WRITE 26577 08:31:145326099 0 EOF dfb2e11c
f6bfb7f4 d1b450c4 e9b58d54 f6bfb7fc
1162: -> POSIX ADVISORY WRITE 24581 08:31:145326099 0 EOF f6bfb7f0
f6bfb3a4 dfb2e120 e9b58d54 f6bfb3ac
1162: -> POSIX ADVISORY WRITE 21377 08:31:145326099 0 EOF f6bfb3a0
c87333a4 f6bfb7f4 e9b58d54 c87333ac
1162: -> POSIX ADVISORY WRITE 32575 08:31:145326099 0 EOF c87333a0
d1c32628 f6bfb3a4 e9b58d54 d1c32630
1162: -> POSIX ADVISORY WRITE 32659 08:31:145326099 0 EOF d1c32624
edb35e10 c87333a4 e9b58d54 edb35e18
1162: -> POSIX ADVISORY WRITE 31893 08:31:145326099 0 EOF edb35e0c
edb35798 d1c32628 e9b58d54 edb357a0
1162: -> POSIX ADVISORY WRITE 14146 08:31:145326099 0 EOF edb35794
cdf5ff24 edb35e10 e9b58d54 cdf5ff2c
1162: -> POSIX ADVISORY WRITE 21049 08:31:145326099 0 EOF cdf5ff20
e8adbca0 edb35798 e9b58d54 e8adbca8
1162: -> POSIX ADVISORY WRITE 26200 08:31:145326099 0 EOF e8adbc9c
cd5a7798 cdf5ff24 e9b58d54 cd5a77a0
1162: -> POSIX ADVISORY WRITE 26969 08:31:145326099 0 EOF cd5a7794
d1695628 e8adbca0 e9b58d54 d1695630
1162: -> POSIX ADVISORY WRITE 13275 08:31:145326099 0 EOF d1695624
d1b45e6c cd5a7798 e9b58d54 d1b45e74
1162: -> POSIX ADVISORY WRITE 17679 08:31:145326099 0 EOF d1b45e68
cdf5f5cc d1695628 e9b58d54 e9b58d60
The PIDs waiting for locks correspond to the PIDs of the stuck lmtpd
processes. At that point I restarted Cyrus, because I wanted to get the
system going again. Only later did I read that the actual problem is more
likely caused by a locked quota file. Unfortunately I did not further
examine PID 1974, which held the lock.
Anyway, this past Saturday it happened again when I was not around.
Ultimately the system shut itself down, because the Red Hat cluster
software wasn't able to update its raw device any longer. So when I
examined the case, all I could go on were log files. Still I was able to
piece it together, I think.
It started here:
May 14 09:04:59 lvr1 imapd[14301]: login:
vpn82-4.vpn.Uni-Koeln.DE[134.95.82.4] s0283928 plaintext+TLS
May 14 09:04:59 lvr1 imapd[14301]: seen_db: user s0283928 opened
/var/lib/imap/user/B/s0283928.seen
May 14 09:04:59 lvr1 imapd[14301]: open: user s0283928 opened Drafts
As you can see a VPN dial-up connection was used. I think it likely that
the notorious APPEND problem happened. i.e. the connection was dropped by
the client while the server was doing an APPEND. This is bug 1270
<https://bugzilla.andrew.cmu.edu/show_bug.cgi?id=1270> and is marked FIXED.
For completeness' sake: after the IMAP session above, hte lmtpd processes
started getting stuck when delivering to this user:
May 14 15:59:44 lvr1 lmtpd[3879]: duplicate_check:
<F7152D0EDA04D3119A210008C789EB1113302803 at pcaple01.vw.com.mx> user.s0283928
0
May 14 16:42:57 lvr1 lmtpd[26559]: duplicate_check:
<F7152D0EDA04D3119A210008C789EB1113302803 at pcaple01.vw.com.mx> user.s0283928
0
May 14 17:26:45 lvr1 lmtpd[28635]: duplicate_check:
<F7152D0EDA04D3119A210008C789EB1113302803 at pcaple01.vw.com.mx> user.s0283928
0
May 14 18:08:01 lvr1 lmtpd[32017]: duplicate_check:
<F7152D0EDA04D3119A210008C789EB1113302803 at pcaple01.vw.com.mx> user.s0283928
0
May 14 18:43:25 lvr1 lmtpd[6860]: duplicate_check:
<F7152D0EDA04D3119A210008C789EB1113302803 at pcaple01.vw.com.mx> user.s0283928
0
May 14 19:34:23 lvr1 lmtpd[28641]: duplicate_check:
<F7152D0EDA04D3119A210008C789EB1113302803 at pcaple01.vw.com.mx> user.s0283928
0
May 14 19:49:37 lvr1 lmtpd[26587]: duplicate_check:
<F7152D0EDA04D3119A210008C789EB1113302803 at pcaple01.vw.com.mx> user.s0283928
0
May 14 20:44:45 lvr1 lmtpd[2960]: duplicate_check:
<F7152D0EDA04D3119A210008C789EB1113302803 at pcaple01.vw.com.mx> user.s0283928
0
May 14 21:24:31 lvr1 lmtpd[16921]: duplicate_check:
<F7152D0EDA04D3119A210008C789EB1113302803 at pcaple01.vw.com.mx> user.s0283928
0
May 14 21:45:35 lvr1 lmtpd[3571]: duplicate_check:
<F7152D0EDA04D3119A210008C789EB1113302803 at pcaple01.vw.com.mx> user.s0283928
0
May 14 22:25:25 lvr1 lmtpd[31517]: duplicate_check:
<F7152D0EDA04D3119A210008C789EB1113302803 at pcaple01.vw.com.mx> user.s0283928
0
May 14 22:59:53 lvr1 lmtpd[26060]: duplicate_check:
<F7152D0EDA04D3119A210008C789EB1113302803 at pcaple01.vw.com.mx> user.s0283928
0
May 14 23:37:39 lvr1 lmtpd[4317]: duplicate_check:
<F7152D0EDA04D3119A210008C789EB1113302803 at pcaple01.vw.com.mx> user.s0283928
0
May 15 00:08:15 lvr1 lmtpd[930]: duplicate_check:
<F7152D0EDA04D3119A210008C789EB1113302803 at pcaple01.vw.com.mx> user.s0283928
So, is this really fixed in 2.2.3? Has anybody seen something like this
since? And would it be OK to apply the patch to 2.1.16? Or does it depend
on other stuff that happened in CVS between 2.1.16 and the time the patch
was created?
Thanks, Sebastian Hagedorn
--
Sebastian Hagedorn M.A. - RZKR-R1 (Gebäude 52), Zimmer 18
Zentrum für angewandte Informatik - Universitätsweiter Service RRZK
Universität zu Köln / Cologne University - Tel. +49-221-478-5587
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 194 bytes
Desc: not available
Url : https://lists.andrew.cmu.edu/mailman/private/info-cyrus/attachments/20040517/87cf76ee/attachment.bin
More information about the Info-cyrus
mailing list