Random deadlocks with cyrus-imapd-2.1.14

Gary Mills mills at cc.umanitoba.ca
Tue Sep 27 15:30:07 EDT 2005


On Wed, Sep 21, 2005 at 05:58:58PM -0500, Gary Mills wrote:
> We are running cyrus-imapd-2.1.14, using db-3.1.17 for the
> deliver.db and tls_sessions.db.  Lately, we've been getting
> deadlocks that stall mail delivery.  When they happen, all of
> the lmtpd processes are blocked waiting to acquire a lock.
> If I try to run the db_stat utility, it also blocks.  These
> deadlocks happen quite randomly, sometimes months apart, but
> sometimes only days.  The server is a 4-CPU Sun 480 running
> Solaris 9.  I haven't seen reports of anyone else having this
> problem.

Well, the patch from:

	http://email.uoa.gr/download/cyrus/cyrus-imapd-2.1.18/cyrus-imapd-2.1.18.quotalock.diff

certainly looked like the solution to this problem.  But, two days
after putting the patched Cyrus server into production, the deadlock
reappeared.  A stack trace of one of the stuck lmtpd processes
looks like this:

# pstack 18401
18401:  lmtpd
 fef9f950 lwp_mutex_lock (fefd3e68)
 000b86d0 __db_pthread_mutex_lock (0, fefd3e68, 0, 5, ffbfb2e8, 0) + 64
 000db1cc lock_get (0, c856, 0, 1b6504, 1, c856) + ec
 000cca08 __db_lget (1b64b0, 0, 1, 1, 0, ffbfafa8) + 13c
 000ef350 __bam_search (0, ffbfb1f4, 1d2918, 1, 1b5cf0, ffbfb08c) + c0
 000e69d4 __bam_c_search (0, 1b5e28, 1b5cf0, 1d2918, 1d, 1b64b0) + 5c4
 000e497c __bam_c_get (1b64b0, 1, ffbfb1d8, 0, ffbfb0f4, 1d2918) + 434
 000c8f50 __db_c_get (1d2838, 1b6590, ffbfb1d8, 1b5cf0, ffbfb1f4, 1d) + 250
 000c5be8 __db_get (0, 0, ffbfb1f4, ffbfb1d8, ffbfb1f4, 1d) + 118
 0008c9b0 myfetch  (1b5cf0, ffbfb2f4, 2f, ffbfb2ec, ffbfb2e8, 0) + 148
 0008cb84 fetch    (1b5cf0, ffbfb2f4, 2f, ffbfb2ec, ffbfb2e8, 0) + 4c
 0006286c duplicate_check (195a18, 21, ffbfb773, c, 81010100, ff00) + 10c
 0002f8f4 deliver_mailbox (1d14e0, 1b6aa8, 3e5, 0, 0, 0) + 12c
 00030488 deliver  (1e7d18, 1d1538, ffbfccf3, 0, 81010100, ff00) + 838
 00035d1c lmtpmode (195868, 1d2720, 1d2640, 0, 0, 0) + 12ec
 0002cc34 service_main (1, 1b1fe0, ffbffaa4, 1a9e0, 2c5b8, 1) + 144
 0002c7bc main     (1, ffbffa9c, ffbffaa4, 195800, 0, 0) + d54
 0002b058 _start   (0, 0, 0, 0, 0, 0) + 108


> I notice that there is a db_deadlock utility that is able to
> break deadlocks.  Should I be running this, or does Cyrus already
> do deadlock detection?  I haven't seen any mention of this utility
> in the Cyrus documents.

It does seem as if a database deadlock is the problem.  Should I be
running this db_deadlock utility?  I suppose I could try it the next
time the problem occurs.

-- 
-Gary Mills-    -Unix Support-    -U of M Academic Computing and Networking-



More information about the Info-cyrus mailing list