cyrus 2.4 deadlock identified: SIGALRM race

Thomas Jarosch thomas.jarosch at intra2net.com
Tue Sep 1 08:31:23 EDT 2015


Hey there,

thanks to the recent lock debugging tool[1] and very good luck,
I was able to spot the mysterious cyrus 2.4 (and earlier) deadlock.

Here's the output from the lock debugger:
/usr/cyrus/bin/imapd (pid 3301) holding WRITE lock for /datastore/imap-mails/user/projects/cyrus.index
  /usr/cyrus/bin/imapd (pid 21130) ++WAITING++ for WRITE lock on /datastore/imap-mails/user/projects/cyrus.index
  /usr/cyrus/bin/imapd (pid 20536) ++WAITING++ for WRITE lock on /datastore/imap-mails/user/projects/cyrus.index
  ..

Backtrace of process 3301:
 #0 0xb77c9428 in __kernel_vsyscall ()
 #1 0xb735af91 in __lll_lock_wait_private () from /lib/libc.so.6
 #2 0xb72c88fe in _L_lock_9705 () from /lib/libc.so.6
 #3 0xb72c66f0 in malloc () from /lib/libc.so.6
 #4 0x080b7557 in xzmalloc (size=32) at xmalloc.c:68
 #5 0x080a27b6 in seqset_init (maxval=0, flags=1) at sequence.c:59
 #6 0x0806d152 in index_tellexpunge (state=0x9421ca8) at index.c:2319
 #7 index_tellchanges (state=0x9421ca8, canexpunge=1, printuid=0) at index.c:2370
 #8 0x08071041 in index_check (state=0x9421ca8, usinguid=1, printuid=0) at index.c:682
 #9 0x080515ae in idle_update (flags=(IDLE_MAILBOX | IDLE_ALERT)) at imapd.c:2833
 #10 0x0809abc5 in idle_handler (sig=14) at idle.c:197
 #11 <signal handler called>
 #12 0xb72c52d4 in _int_malloc () from /lib/libc.so.6
 #13 0xb72c66fa in malloc () from /lib/libc.so.6
 #14 0xb74bb21c in ?? () from /usr/lib/libcrypto.so.1.0.0
 Backtrace stopped: previous frame inner to this frame (corrupt stack?)


Tadaaa! We are in a middle of a malloc() call, SIGALRM triggers
for imap idle and does another malloc() call that deadlocks.

-> never ever put complex code in signal handlers.
Only set a volatile flag and be done with it.

After I killed process 3301, all the other processes resumed operation as normal.


The good news: This specific deadlock shouldn't happen anymore in 2.5+
as the idle code was refactored a few years ago:

------------------------------
commit 17eb391b918c394319e4d1fe5985de10128f34d7
Author: Greg Banks <gnb at fastmail.fm>
Date:   Fri Mar 23 17:27:32 2012 +1100

    idle: don't use signals, use AF_UNIX dgrams
    
    Communications back from idled to imapds are via a message sent on the
    AF_UNIX socket.  The IDLE command is implemented as a select() loop, and
    there's absolutely nothing that needs to be done in signal handler
    context.  Best of all, no more unexpected delivery of SIGUSR1 or
    SIGUSER2, assassinating innocent bystander processes.
------------------------------


@Ken: The keep_alive() function in httpd.c (CalDAV)
probably suffers from the same signal handler issue.

Cheers,
Thomas

[1] http://lists.andrew.cmu.edu/pipermail/cyrus-devel/2015-July/003378.html



More information about the Cyrus-devel mailing list