possible self-deadlock in idle signal handler
Michael Bacon
baconm at email.unc.edu
Sat Mar 28 09:37:56 EDT 2009
We're experiencing some problems, particularly with a small number of
users, which manifest themselves in the dreaded "one deadlocked,
hundreds waiting" process logjam. The keystone process appears to be
an imapd deadlocked on itself in this manner (this is Solaris 9):
-> pstack 19090
19090: imapd
febc5994 lwp_park (0, 0, 0)
febc206c slow_lock (fecc05a8, feba0000, 0, fecbc000, 14, 0) + 58
fec46e70 malloc (c, 0, 13d668, 13d66c, 28cc, 13d790) + 18
00078ac0 xmalloc (c, 13d790, 0, 0, 0, 0) + 4
00074a64 lock_or_refresh (13d660, 1364b4, 107400, 0, 0, 0) + 10c
00074d50 myfetch (13d660, 1bbe58, 10, ffbfb25c, ffbfb254, 1364b4) + 44
00060d74 seen_readit (1364a0, ffbfb2ec, ffbfb2e8, 1252bc, ffbfb2e4, 1)
+ 60
0003d0c4 index_checkseen (123a00, 0, 0, 603, 1e5a4c, 87fd0) + 4c
0003e298 index_check (123a00, 0, 1, 125000, ffbfc370, 125000) + 234
0002c574 idle_update (3, 0, 0, 0, 0, 0) + 24
0005f7cc idle_handler (e, 0, ffbfcb20, 0, 0, 0) + 5c
febc5bac __sighndlr (e, 0, ffbfcb20, 5f770, 0, 0) + c
febbf804 call_user_handler (e, 0, ffbfcb20, 0, 0, 0) + 234
febbf9b4 sigacthandler (e, 0, ffbfcb20, 8, 1bd7c0, 0) + 64
--- called from signal handler with signal 14 (SIGALRM) ---
fec470d4 _malloc_unlocked (64, 0, 0, fecbc000, 0, 0) + 240
fec46e78 malloc (64, ff0a07d0, a3, 1c4d0d, db, 6d) + 20
fefc5820 default_malloc_ex (64, ff0b17b0, ca, ca, 0, ffe43088) + 20
fefc61e4 CRYPTO_malloc (0, ff0b17b0, ca, 1bcff0, 1bcf78, 1bcf78) + 84
ff036efc EVP_DigestInit_ex (ffbfd150, ff0dfbb0, 0, fffffff8, 0,
ffbfd1fd) + 13c
fefdabec HMAC_Init_ex (ffbfd13c, ffbfd150, ffbfd048, ff0dfbb0, 0, 0) +
cc
ff160b70 tls1_mac (1bea88, ffbfd288, 0, 20, 0, 1) + 90
ff15cfa4 ssl3_read_bytes (1bea88, 17, ffbfd288, 8c, 1c4d03, 0) + 524
ff15a9c4 ssl3_read (1bea88, 13aef0, 1000, 0, 378, 0) + 44
ff16a30c SSL_read (0, 13aef0, 1000, 0, ffbfd5bc, ffbfd5b1) + 6c
0006bd5c prot_fill (13ae78, 0, 0, 0, ffbfd5bc, ffbfd428) + ec
0005e564 getword (13ae78, 125108, 1, 1a9e0, 2c8dc, 125000) + ac
0002c8f0 cmd_idle (13d358, 7dc00, 0, 0, 730061, 0) + 2e8
0002ea6c cmdloop (0, 1360d8, 8bc60, 8bc60, 123c00, 125000) + df0
00030d34 service_main (123c00, 132080, ffbffc2c, 0, 1aa50, 11a800) +
180
0001aaf8 main (ffbff2b4, 7c000, fa, 27667, 2602e4, 49c71400) + 640
0001a2ec _start (0, 0, 0, 0, 0, 0) + 5c
From looking online, what looks to be the problem is that the SSL stack
was in the middle of a malloc() call when the SIGALRM went off, causing
the process to try to open the seen file, which resulted in another
malloc. The second malloc requests a mutex on malloc for the process
(part of Solaris's thread internals), but that mutex is held by the
first call, and hence the mutex lock will never return and the process
is permanently hung, holding the lock for the mailbox.
Would anyone happen to have any tips on getting out from under this?
Thanks,
Michael Bacon
ITS Messaging
UNC Chapel Hill
More information about the Info-cyrus
mailing list