Severe issue with high-volume IMAP server stalling, possible patch

Fri Apr 15 05:42:12 EDT 2016

Hi Ellie,

> It is failing our Cassandane tests, because the test suite is unable to
> cleanly shutdown the master process after each test finishes.
> 
> Are you able to cleanly shut down the master process with a SIGTERM in
> your environment?  Can you cleanly shut one down that has very little
> activity on it?

Indeed. We also had the idled-not-terminating-issue and still applied a
kill -9 after some time, so did not observe the issue yet.

> I'm having a look at the way tvptr is set.  If the process is
> in_shutdown, then tvptr remains NULL (which tells myselect() to block
> indefinitely and never time out).  I'm not sure if this is going to turn
> out to be significant here.  I'm also not sure why we do this
> differently for this case.
>
> [...]
>
> What I think is happening is that, once shutdown is initiated, the only
> way myselect() can return is if one of the rfds becomes active, or if a
> signal arrives.  It will no longer time out.  But, once shutdown is
> initiated, at a certain point rfds are no longer becoming active
> (because all the children have themselves shut down), and signals are no
> longer occurring, so myselect() ends up blocked forever.

I don't think we should block at all in this case. The master loop
should keep running and clean up. This is definitely a bug in my patch.

> Maybe the answer is to break out of the myselect loop after one
> iteration if we're in_shutdown, so that we don't block indefinitely
> waiting for iterations that will never occur.  I think just changing the
> loop condition from 
>     while (r == -1)
> to
>     while (!in_shutdown && r == -1)
> might do the trick.  

I'd instead set a non-null, but zero timeout, so select will still clean
up the signal mask and query for FDs. I pushed an alternative fixup on
GitHub:

https://github.com/JensErat/cyrus-imapd/commit/bc3bacf70261711a9c810107cddb081df65a552c

Looking at the code, I'm wondering whether we need the test for
scheduled events at all. As I understand, the child janitor is always
registered as an event running all 10 seconds, so there should be no
branch for schedule == NULL. Anyway, testing probably won't hurt, either.

Because the timeout is not only NULLed, but also set to zero as default,
we will not block any more iff

- we're in shutdown or
- no schedule is set up (which should not happen, anyway, but we have a
safe fallback if the schedule is not completely set up/already torn down
on shutdown)

If I'm not mistaken, this actually also might have been an issue before
-- but don't ask me why it wasn't triggered. Probably because of good
luck and some interrupts (eg. the shutdown signal) being sufficient up
to now.

> This behaviour also suggests that a reasonable limit for interruptions
> might be one that scales with the number of child processes (rather than
> the arbitrary 5), but I don't think this alone would be enough to avoid
> the problem I just encountered.  If the remaining children all exit at
> the same time, and get processed by a single myselect(), then there's
> still n-1 loop iterations before shutdown would occur -- same problem.

I'm not sure about this. This is the maximum number of SIGCHLDs, but we
also might fetch a SIGTERM on the go. Anyway: It is _very_ unlikely that
we get just another signal in-between the select calls. At least on our
setup, we never observed it at all, and this is a rather high-volume
service! Usually, the signals will arrive (but be blocked) somewhere
else in the master loop, and multiple signals are cleaned up with a
single (first) select statement.

Anyway, nothing bad happens if we _sometimes_ pass over the message
handling, and at least we're now logging a warning after some retries.
If somebody has severe issues with this, he'll get aware of the reasons
by tweaking his configuration (for example by setting a more reasonable
-T parameter) or digging deeper.

Regards,
Jens

-- 
Jens Erat
Universität Konstanz
Kommunikations-, Infomations-, Medienzentrum (KIM)
Abteilung Basisdienste
D-78457 Konstanz
Mail: jens.erat at uni-konstanz.de

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4913 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.andrew.cmu.edu/pipermail/cyrus-devel/attachments/20160415/e0e80f0e/attachment-0001.p7s>