Random deadlocks with cyrus-imapd-2.1.14
Gary Mills
mills at cc.umanitoba.ca
Tue Sep 27 15:30:07 EDT 2005
On Wed, Sep 21, 2005 at 05:58:58PM -0500, Gary Mills wrote:
> We are running cyrus-imapd-2.1.14, using db-3.1.17 for the
> deliver.db and tls_sessions.db. Lately, we've been getting
> deadlocks that stall mail delivery. When they happen, all of
> the lmtpd processes are blocked waiting to acquire a lock.
> If I try to run the db_stat utility, it also blocks. These
> deadlocks happen quite randomly, sometimes months apart, but
> sometimes only days. The server is a 4-CPU Sun 480 running
> Solaris 9. I haven't seen reports of anyone else having this
> problem.
Well, the patch from:
http://email.uoa.gr/download/cyrus/cyrus-imapd-2.1.18/cyrus-imapd-2.1.18.quotalock.diff
certainly looked like the solution to this problem. But, two days
after putting the patched Cyrus server into production, the deadlock
reappeared. A stack trace of one of the stuck lmtpd processes
looks like this:
# pstack 18401
18401: lmtpd
fef9f950 lwp_mutex_lock (fefd3e68)
000b86d0 __db_pthread_mutex_lock (0, fefd3e68, 0, 5, ffbfb2e8, 0) + 64
000db1cc lock_get (0, c856, 0, 1b6504, 1, c856) + ec
000cca08 __db_lget (1b64b0, 0, 1, 1, 0, ffbfafa8) + 13c
000ef350 __bam_search (0, ffbfb1f4, 1d2918, 1, 1b5cf0, ffbfb08c) + c0
000e69d4 __bam_c_search (0, 1b5e28, 1b5cf0, 1d2918, 1d, 1b64b0) + 5c4
000e497c __bam_c_get (1b64b0, 1, ffbfb1d8, 0, ffbfb0f4, 1d2918) + 434
000c8f50 __db_c_get (1d2838, 1b6590, ffbfb1d8, 1b5cf0, ffbfb1f4, 1d) + 250
000c5be8 __db_get (0, 0, ffbfb1f4, ffbfb1d8, ffbfb1f4, 1d) + 118
0008c9b0 myfetch (1b5cf0, ffbfb2f4, 2f, ffbfb2ec, ffbfb2e8, 0) + 148
0008cb84 fetch (1b5cf0, ffbfb2f4, 2f, ffbfb2ec, ffbfb2e8, 0) + 4c
0006286c duplicate_check (195a18, 21, ffbfb773, c, 81010100, ff00) + 10c
0002f8f4 deliver_mailbox (1d14e0, 1b6aa8, 3e5, 0, 0, 0) + 12c
00030488 deliver (1e7d18, 1d1538, ffbfccf3, 0, 81010100, ff00) + 838
00035d1c lmtpmode (195868, 1d2720, 1d2640, 0, 0, 0) + 12ec
0002cc34 service_main (1, 1b1fe0, ffbffaa4, 1a9e0, 2c5b8, 1) + 144
0002c7bc main (1, ffbffa9c, ffbffaa4, 195800, 0, 0) + d54
0002b058 _start (0, 0, 0, 0, 0, 0) + 108
> I notice that there is a db_deadlock utility that is able to
> break deadlocks. Should I be running this, or does Cyrus already
> do deadlock detection? I haven't seen any mention of this utility
> in the Cyrus documents.
It does seem as if a database deadlock is the problem. Should I be
running this db_deadlock utility? I suppose I could try it the next
time the problem occurs.
--
-Gary Mills- -Unix Support- -U of M Academic Computing and Networking-
More information about the Info-cyrus
mailing list