Graceful restart also for imapd.conf

Thu Oct 6 10:14:03 EDT 2011

As proposed by Olivier, there is the 'sighup' branch available on 
git://github.com/worldline-messaging/cyrus-imapd.git. It is based on 
current official cyrus 'master' branch.

Details about the commits:

20a7e4b32c01e7c8dd7aeb8eae41d35e4c630369 - Recycle running services upon 
SIGHUP on master.
Currently when SIGHUP is sent to the master process, only new and legacy 
services are being specifically processed:
   - new ones are being started
   - legacy ones are stopped by transmitting SIGHUP to the concerned 
children (to stop them once the client - if connected - leaves)
What we propose is to extend the use of SIGHUP to also recycle remaining 
services so that any change to imapd.conf is taken into account as fast 
as possible.

5b55aab917d6686c3ba74dc8abad1a3eb906cd12 - Send message to master when 
service is exiting.
This commit is useful to make recycling smoother. Commit log:

   Master process is notified about children exiting by a SIGCHLD 
signal, which are taken into account but processed separately (not right 
when receiving the signal).
   The master process uses a 'select' call to wait for messages from its 
children, and manage other events.
   Signal handlers in the master process are set with the SA_RESTART 
flag. But the POSIX specifications let the implementation decide whether 
'select' restarts or returns EINTR in this case. Thus it may take some 
time ('select' timeout) before the master process actually reap children 
(and fork new ones when necessary). It can also happen due to race 
conditions.

   Provided that letting children send back a message when they are 
exiting is not too much time consuming (for both child and master), 
removing the static MESSAGE_MASTER_ON_EXIT configuration variable is 
actually useful for smoother janitoring, and faster services recycling 
when sending SIGHUP to master process.

I actually wonder why 'MESSAGE_MASTER_ON_EXIT' was used and set to 0 up 
to now. Compared to what services and master are already doing, I 
believe that it should not consume a significant amount of resources to 
send this 'exit' message. But maybe someone can prove me wrong here ?

4578a0268a1e1c2c1f0b33d68e547fc864dcca4b - Remain root on master process.
Currently master process is usually started as root to later become cyrus.
With SIGHUP it reloads its configuration and can start newly added 
services. Sometimes this is however problematic because non privileged 
users (as cyrus) are not allowed to bind ports below 1024: this results 
in the added service being unabled to start.
To be fair I don't know if there are - for all platforms targeted by 
cyrus - ways to circumvent this security limitation. At least I tried 
setcap'ing the exe on Ubuntu, but it did not work as expected ...
Now, as far as security is concerned it is best to not be root. But 
after thinking about it, two things make us think that maybe the master 
process could stay root:
  - it does not discuss with external clients as IMAP/POP/etc services 
do; so no remote exploits, right ?
  - many daemon services do run as root by default - ok, ok, that does 
not mean it's the right thing to do :p
Or maybe it could be root by default, and the configuration would allow 
to use a given user/group, as some other daemon services do.
What do you think ?

As usual, comments are welcomed :)

Now, a side note about something I observed while messing with signals 
in cyrus: from time to time I did get deadlocked processes upon SIGTERM; 
for example, when sending it to the master process right after starting 
it (yeah I know, who in their sane mind would want to do that ? - except 
me).

Stacktraces of deadlocked processes:
#0  0xf7777430 in __kernel_vsyscall ()
#1  0xf7388753 in __lll_lock_wait_private () at 
../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:95
#2  0xf731bbac in _L_lock_10489 () from /lib/i386-linux-gnu/libc.so.6
#3  0xf731a553 in __libc_realloc (oldmem=0x97929e0, bytes=62) at 
malloc.c:3813
#4  0xf7309cea in _IO_mem_finish (fp=0x9792868, dummy=0) at memstream.c:132
#5  0xf7305949 in _IO_new_fclose (fp=0x9792868) at iofclose.c:66
#6  0xf73764bc in __vsyslog_chk (pri=<value optimized out>, flag=1, 
fmt=0x805482e "exiting on SIGTERM/SIGINT", ap=0xffb7f43c " o+\367\023\t")
     at ../misc/syslog.c:228
#7  0xf7376896 in __syslog_chk (pri=6, flag=1, fmt=0x805482e "exiting on 
SIGTERM/SIGINT") at ../misc/syslog.c:131
#8  0x08049b7f in syslog (sig=15) at /usr/include/bits/syslog.h:32
#9  sigterm_handler (sig=15) at master.c:1067
#10 <signal handler called>
#11 0xf7777430 in __kernel_vsyscall ()
#12 0xf73430d7 in __libc_fork () at 
../nptl/sysdeps/unix/sysv/linux/i386/../fork.c:130
#13 0x0804afb5 in spawn_service (si=2) at master.c:633
#14 0x0804cce4 in main (argc=10, argv=0xffb80db4) at master.c:2069

#0  0xf7777430 in __kernel_vsyscall ()
#1  0xf7388753 in __lll_lock_wait_private () at 
../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:95
... (same backtrace)
#13 0x0804afb5 in spawn_service (si=3) at master.c:633
#14 0x0804cce4 in main (argc=10, argv=0xffb80db4) at master.c:2069

strace of deadlocked a master daemon:
... (startup, children forking, etc)
12:07:33.299382 --- SIGTERM (Terminated) @ 0 (0) ---
12:07:33.299466 --- SIGCHLD (Child exited) @ 0 (0) ---
12:07:33.299503 sigreturn()             = ? (mask now [TERM]) <0.000183>
12:07:33.299799 --- SIGCHLD (Child exited) @ 0 (0) ---
12:07:33.299818 sigreturn()             = ? (mask now [TERM]) <0.000005>
12:07:33.299861 rt_sigaction(SIGTERM, {SIG_IGN, [], 0}, NULL, 8) = 0 
<0.003123>
12:07:33.303060 kill(0, SIGTERM)        = 0 <0.000013>
12:07:33.303124 time(NULL)              = 1317816453 <0.000007>
12:07:33.303163 getpid()                = 21549 <0.000005>
12:07:33.303197 futex(0xf73ad3c0, FUTEX_WAIT_PRIVATE, 2, NULL 
<unfinished ...>
13:48:01.724288 +++ killed by SIGKILL +++

strace of a deadlocked child:
12:07:33.298775 --- SIGTERM (Terminated) @ 0 (0) ---
12:07:33.298871 rt_sigaction(SIGTERM, {SIG_IGN, [], 0}, NULL, 8) = 0 
<0.000018>
12:07:33.298986 kill(0, SIGTERM)        = 0 <0.000013>
12:07:33.299067 time(NULL)              = 1317816453 <0.000044>
12:07:33.299340 getpid()                = 21557 <0.000016>
12:07:33.299423 futex(0xf73ad3c0, FUTEX_WAIT_PRIVATE, 2, 
NULL13:49:12.294353 +++ killed by SIGKILL +++

So, according to that, here is what may have happened:
During a short lapse of time the child has just been forked and have not 
yet 'exec'ed the service binary. According to the POSIX specs, the 
signal handlers set by the parent are not resetted here - but they do 
after 'exec'.
So now the master process receives SIGTERM: it enters the signal 
callback function 'sigterm_handler', and thus propagates the signal to 
its group before exiting. Now if one of the forks has not yet reached 
'exec', it will also callback 'sigterm_handler' upon SIGTERM.
But right before exiting this function, 'syslog' is called (to say it 
received the signal and is exiting). And according to the gdb 
backtraces, it is where is gets locked.

Actually POSIX has a list of functions that shall be safe to call (and 
said async-signal-safe) while handling a signal. And syslog is not part 
of it.
That would mean that something syslog do - at least on the 
implementation I have here - is not safe to do as far as signal handling 
is concerned. But that would be a pity if one cannot syslog anything in 
that case :(

Note that I observed the deadlock upon startup, but I guess that on 
heavy load (platforms where there are a lot of service instances) it may 
happen even if SIGTERM is sent later.

Regards
Julien