Severe issue with high-volume IMAP server stalling, possible patch

Fri May 13 01:52:13 EDT 2016

Hi Jens,

Sorry about the late response, I've been unwell.

> > Alternative patch implementing this approach is here:
> > https://github.com/elliefm/cyrus-imapd/tree/v25/alt-master-eintr
> 
> I vote against changing the logic for the 2.5 branch. This does not fix
> any issues, but unnecessarily changes parts of the most central logic
> for a minor performance gain. If this is rolled out for v3, I'm fine,
> but for a stable release this just introduces unnecessary risk without a
> major improvement.

My main concern with my own implementation is that I don't really have
any way to verify whether or not it actually fixes the problem you're
reporting (but see below -- I've changed my mind).

> > Right now I'm not sure if we should try to keep the old behaviour (retry
> > from top on EGAIN), try to keep it the same as EINTR (so give it a
> > counter and a limit), or do something new (like just pretend it was a
> > timeout, like we now do after too many interrupts, so that the end of
> > loop processing can occur).  Any thoughts?
> 
> I don't think retrying from top is _ever_ a good solution. Consider some
> system throwing EAGAIN in some weird load behavior (in the beginning,
> this was one of the things we suspected, but the issue proved to be
> EINTR instead). We'd be stuck in the exactly same issue we had with
> interrupts.

I'm with you here.

Given that we don't really know when EAGAIN might occur; given that
Linux documentation recommends treating it similarly to EINTR for
portability; and given that the number 5 was utterly arbitrary to begin
with, I propose that we just treat it *exactly* the same as EINTR:

https://github.com/elliefm/cyrus-imapd/commit/fa414b9030cbf48678dc2928ebbd59d804a591fd

And if a problem with EAGAIN ever turns up on some odd platform, then we
can debug and fix it when there's actually something to debug and fix.

> >    * no need for an inner loop around myselect(), so no contortions to
> >    make that safe, and signals don't get stuck
> 
> I don't think our issue was the child janitor, but message handling. As
> far as we observed the issue, the interrupt storms mostly occured on one
> service (eg., IMAP on port 143), but stalled the other services (eg.,
> IMAPs on port 993). In the beginning of the loop, new 143-processes have
> been forked, which have been used (and terminated) quickly, resulting in
> the interrupt storm. On the other hand, port 993 processes sent in
> messages "I'm busy" so the master process should have forked new ones,
> but never did; because message handling stalled. With your patch,
> exactly this could still happen, as an interrupted select returns an
> empty FDSET. You need the inner loop to catch the interrupts and get a
> valid FDSET, or you might get stuck without message handling again.

I was about to reply with a "but...", but while reading over the code
again to confirm my understanding, I think I've changed my mind.

I think the problem you describe can only happen on platforms with
pselect.  On platforms without it, the interrupts will arrive whenever
they arrive, and if this happens to be while myselect() is blocking,
then the resulting EINTR means no fds were ready yet anyway -- and so
there's no problem.

But on platforms with pselect, whereby interrupts cannot occur outside
of pselect, any interrupts that arrived outside the myselect() call
appear as soon as pselect unblocks them -- seemingly before it bothers
to check for ready fds.  The larger the loop surrounding myselect(), the
more time there is for another interrupt to be queued before we get back
to it, and the less often pselect will actually get as far as looking
for ready fds.  So there might actually be lots of ready fds, but we
don't know, because pselect keeps reacting to queued interrupts instead
of checking for/reporting them.

[I realise you understand this, I'm kind of explaining it to myself
here.]

So what we want is as small as possible a loop around the myselect()
call, so that if it's interrupted we can process the interrupt and then
call myselect() again ASAP -- hopefully before another signal arrives,
so that pselect gets an opportunity to actually look for ready file
descriptors.  Which is what your patch does, and mine doesn't.

There is something to be said for handling the return from myselect() as
my patch does (i.e. it's not an "r" value, but rather a number of ready
fds, and even if we don't do anything new with it, at least naming it
correctly enhances readability).  But even so, we'd still need your
tight inner loop around it to stop pselect choking out.

So my plan now is:

2.5: your patch + shutdown fix + EAGAIN fix

3.0: as per 2.5 + (maybe) tidying up the myselect variable naming and
making the section after myselect use it selectively

How's that sound?

ellie

On Wed, Apr 27, 2016, at 06:03 PM, Jens Erat wrote:
> Hi Ellie,
> 
> > Though: I've just noticed another possible issue.  We might get stuck in
> > that inner loop if select keeps raising EAGAIN (it has no counter, and
> > the "break" statement breaks out of the switch, not the inner loop).
> 
> > I guess EAGAIN didn't occur in your profiling, but maybe your system
> > can't raise EGAIN from select either?
> 
> We've never received a single EAGAIN on Solaris. I've read somewhere it
> might happen, but that's because EAGAIN shares the magic return code
> with EWOULDBLOCK, indicating you can't just test the socket/FD (I don't
> think this might happen for us).
> 
> If I got this right, EAGAIN might even show a an issue. But then, you
> don't know what all thos *nix systems are doing differently.
> 
> > Right now I'm not sure if we should try to keep the old behaviour (retry
> > from top on EGAIN), try to keep it the same as EINTR (so give it a
> > counter and a limit), or do something new (like just pretend it was a
> > timeout, like we now do after too many interrupts, so that the end of
> > loop processing can occur).  Any thoughts?
> 
> I don't think retrying from top is _ever_ a good solution. Consider some
> system throwing EAGAIN in some weird load behavior (in the beginning,
> this was one of the things we suspected, but the issue proved to be
> EINTR instead). We'd be stuck in the exactly same issue we had with
> interrupts.
> 
> >    * acknowledge that, with pselect() available, SIGCHLD, SIGQUIT,
> >    SIGINT and SIGTERM signals will -only- arrive during a call to
> >    myselect(), because they are blocked at all other times -- so we must
> >    expect a server with lots of short-lived processes to frequently
> >    interrupt out of myselect()
> >    * handle signals as soon as possible after they arrive
> >    * allow child_janitor() an opportunity to run at least sometimes
> >    during interrupt-heavy periods
> 
> I don't think I have to add anything so far.
> 
> >    * minimise useless work
> 
> I'd consider some wasted master cycles acceptable. IMAP servers have a
> "weird" load profile anyway (at least for larger setups): the CPU is
> mostly not doing anything at all, but you have vast IO/memory
> requirements. I don't know Cyrus IMAP's behavior for small systems, but
> at least on our test setup not a lot of "useless" work is performed
> (master loop running all ten seconds), and I'd guess this scales rather
> linearly throughout small systems to large ones. I expect we're running
> one of the largest single machine setups, and looking at the traces, the
> master loop is so fast, it barely ever handles more than one signal or
> interrupt each run. I don't think we have to care about this at all.
> 
> > Alternative patch implementing this approach is here:
> > https://github.com/elliefm/cyrus-imapd/tree/v25/alt-master-eintr
> 
> I vote against changing the logic for the 2.5 branch. This does not fix
> any issues, but unnecessarily changes parts of the most central logic
> for a minor performance gain. If this is rolled out for v3, I'm fine,
> but for a stable release this just introduces unnecessary risk without a
> major improvement.
> 
> I'm not sure if this is important, anyway:
> 
> >    * no more jump-back-to-top-on-signal, so we never skip the end of the
> >    loop entirely
> 
> That's solved with my patch (only addressing this issue), anyway.
> 
> >    * we only need to loop through the services if we know there are
> >    going to be ready file descriptors for at least some of them
> 
> This is a loop over a constant number of services, something <<20
> usually. I don't consider this a problem, and if I'm not mistaken, no
> locks or any other potential issues are involved. If you want to keep
> work minimal: what about looping through the FDSET instead of looping
> through the services and testing the FDSETs? On the other hand, you'd
> need to resolve the service from the FDSET now.
> 
> >    * no need for an inner loop around myselect(), so no contortions to
> >    make that safe, and signals don't get stuck
> 
> I don't think our issue was the child janitor, but message handling. As
> far as we observed the issue, the interrupt storms mostly occured on one
> service (eg., IMAP on port 143), but stalled the other services (eg.,
> IMAPs on port 993). In the beginning of the loop, new 143-processes have
> been forked, which have been used (and terminated) quickly, resulting in
> the interrupt storm. On the other hand, port 993 processes sent in
> messages "I'm busy" so the master process should have forked new ones,
> but never did; because message handling stalled. With your patch,
> exactly this could still happen, as an interrupted select returns an
> empty FDSET. You need the inner loop to catch the interrupts and get a
> valid FDSET, or you might get stuck without message handling again.
> 
> Regards,
> Jens
> 
> 
> Am 27.04.2016 um 06:30 schrieb ellie timoney via Cyrus-devel:
> > Hi again,
> > 
> >> I'll read through the thread again and see if it becomes clear, and
> >> won't push the patch to the 2.5 branch just yet in case this changes it.
> > 
> > I've read back through the thread, and it looks like our goals here are
> > to:
> > 
> >    * acknowledge that, with pselect() available, SIGCHLD, SIGQUIT,
> >    SIGINT and SIGTERM signals will -only- arrive during a call to
> >    myselect(), because they are blocked at all other times -- so we must
> >    expect a server with lots of short-lived processes to frequently
> >    interrupt out of myselect()
> >    * handle signals as soon as possible after they arrive
> >    * allow child_janitor() an opportunity to run at least sometimes
> >    during interrupt-heavy periods
> >    * minimise useless work
> > 
> > In reading about how select, pselect, signals and EAGAIN interact, I've
> > noticed something: we currently treat the return value from myselect()
> > as either success or failure.  We handle a few classes of failure but
> > treat all successes as identical.
> > 
> > But the return value is actually a number of ready descriptors (with
> > negative -- specifically, -1 -- indicating failure).  We can use this to
> > apply the post-select processing sensibly based on whether it's relevant
> > or not.  This simplifies a few things:
> > 
> >    * no more jump-back-to-top-on-signal, so we never skip the end of the
> >    loop entirely
> >    * we only need to loop through the services if we know there are
> >    going to be ready file descriptors for at least some of them
> >    * child_janitor() gets to run even during interrupt storms
> >    * SNMP handling can be selective (based on examples from SNMP docs,
> >    but untested)
> >    * no need for an inner loop around myselect(), so no contortions to
> >    make that safe, and signals don't get stuck
> > 
> > Alternative patch implementing this approach is here:
> > https://github.com/elliefm/cyrus-imapd/tree/v25/alt-master-eintr
> > 
> > It seems fine to run child_janitor() at high frequency -- it already
> > contains calculations to limit its own processing to one round per
> > second, based on time elapsed since the last run.
> > 
> > Thoughts?
> > 
> > ellie
> > 
> > On Wed, Apr 27, 2016, at 12:20 PM, ellie timoney via Cyrus-devel wrote:
> >> Hi Jens,
> >>
> >>> your argumentation seems legit.
> >>
> >> Thanks :)
> >>
> >>> I'm not sure, though whether the master
> >>> should _ever_ wait forever (NULL-pointer). Maybe this should never
> >>> happen, but I'd propose _not_ waiting would be safer than waiting
> >>> forever in some weird cases. And it should never happen but during
> >>> startup and shutdown. Not waiting might result in some CPU cycles
> >>> wasted, but the general code might prove more robust to other minor
> >>> issues.
> >>
> >> I can see a plausible argument here both ways: on the one hand, if we
> >> never use a NULL timeout, then we'll never hang like we did during
> >> shutdown in this particular case.  On the other hand, waking up master
> >> periodically when it has no work to do will potentially degrade
> >> performance for the actual services that are trying to run.  
> >>
> >> Also, hanging at shutdown in the way we did helped us identify the
> >> problem really quickly and fix it before it made it into a release.
> >>
> >> I'm not really sure how frequently master is likely to wake up during
> >> "normal" operation, but I expect it varies a lot by environment and
> >> configuration.  It's hard to estimate where a reasonable compromise
> >> between "don't block indefinitely" and "don't block at all" might lie.
> >>
> >> I could see the addition of some kind of hang-notification wakeup,
> >> whereby if master has been asleep for longer than some configurable
> >> duration, it wakes up and logs a warning.  The natural spot to do this,
> >> though, would be as another scheduled event, which wouldn't do anything
> >> in the particular case of "no schedule" that I think you're interested
> >> in.  So to be useful at all it would require bigger changes to the tvptr
> >> calculation.
> >>
> >> I've had a look through the history of master.c, and the timeout has
> >> been NULL since the master process was conceived.  Only the addition of
> >> the event scheduler introduced the optional timeout.  Given it's been
> >> working well enough like this basically forever, I don't want to change
> >> it on our stable branch.  I would consider a feature like the above for
> >> inclusion in 3.0, though, if anyone was interested enough to contribute
> >> a patch?
> >>
> >>> You have the better understanding of the overall master loop logic and
> >>> what to do best
> >>
> >> You flatter me, but I'm figuring this out as I go too :)
> >>
> >>> I'm fine with either solutions (as long as it works).
> >>
> >> I'll push what we've got (your patch plus my shutdown fix) to the 2.5
> >> branch for inclusion in the next release, and also forward port it to
> >> the master branch...
> >>
> >> Though: I've just noticed another possible issue.  We might get stuck in
> >> that inner loop if select keeps raising EAGAIN (it has no counter, and
> >> the "break" statement breaks out of the switch, not the inner loop).
> >>
> >> My system's documentation for select claims that "some other UNIX
> >> systems" can raise EGAIN (despite POSIX not specifying this error for
> >> select), and suggests portable programs should retry as if for EINTR in
> >> this case.
> >>
> >> I guess EAGAIN didn't occur in your profiling, but maybe your system
> >> can't raise EGAIN from select either?
> >>
> >> The old (prior to patch) behaviour for EAGAIN was to resume the outer
> >> loop from the top, same as EINTR.
> >>
> >> Right now I'm not sure if we should try to keep the old behaviour (retry
> >> from top on EGAIN), try to keep it the same as EINTR (so give it a
> >> counter and a limit), or do something new (like just pretend it was a
> >> timeout, like we now do after too many interrupts, so that the end of
> >> loop processing can occur).  Any thoughts?
> >>
> >> I'll read through the thread again and see if it becomes clear, and
> >> won't push the patch to the 2.5 branch just yet in case this changes it.
> >>
> >> Cheers,
> >>
> >> ellie
> >>
> >> On Tue, Apr 26, 2016, at 08:08 PM, Jens Erat wrote:
> >>> Hi Ellie,
> >>>
> >>> your argumentation seems legit. I'm not sure, though whether the master
> >>> should _ever_ wait forever (NULL-pointer). Maybe this should never
> >>> happen, but I'd propose _not_ waiting would be safer than waiting
> >>> forever in some weird cases. And it should never happen but during
> >>> startup and shutdown. Not waiting might result in some CPU cycles
> >>> wasted, but the general code might prove more robust to other minor
> >>> issues.
> >>>
> >>> You have the better understanding of the overall master loop logic and
> >>> what to do best, I'm fine with either solutions (as long as it works).
> >>>
> >>> Regards,
> >>> Jens
> >>>
> >>> Am 26.04.2016 um 07:18 schrieb ellie timoney:
> >>>> Hi Jens,
> >>>>
> >>>>>> I'm having a look at the way tvptr is set.  If the process is
> >>>>>> in_shutdown, then tvptr remains NULL (which tells myselect() to block
> >>>>>> indefinitely and never time out).  I'm not sure if this is going to turn
> >>>>>> out to be significant here.  I'm also not sure why we do this
> >>>>>> differently for this case.
> >>>>>>
> >>>>>> [...]
> >>>>>>
> >>>>>> What I think is happening is that, once shutdown is initiated, the only
> >>>>>> way myselect() can return is if one of the rfds becomes active, or if a
> >>>>>> signal arrives.  It will no longer time out.  But, once shutdown is
> >>>>>> initiated, at a certain point rfds are no longer becoming active
> >>>>>> (because all the children have themselves shut down), and signals are no
> >>>>>> longer occurring, so myselect() ends up blocked forever.
> >>>>>
> >>>>> I don't think we should block at all in this case. The master loop
> >>>>> should keep running and clean up. This is definitely a bug in my patch.
> >>>>
> >>>> I've read through the old/new code some more, and I don't think this is
> >>>> a bug in your patch.
> >>>>
> >>>> The logic prior to your patch was that myselect would get a NULL timeout
> >>>> by default (and thus block indefinitely waiting for a signal or ready
> >>>> file descriptors).  This makes sense: we want to block until there is
> >>>> something to do.
> >>>>
> >>>> If there were a schedule (and we were not shutting down), then we would
> >>>> inspect it and set up a timeout, so that we would also unblock when it
> >>>> was time to process that.  The timeout might be 0,0 if the next event
> >>>> were already overdue, so that it could be processed immediately.  
> >>>>
> >>>> If we were shutting down, we'd no longer be processing scheduled events,
> >>>> so a scheduled event being due was not cause for a wakeup in this case.
> >>>>
> >>>> Your patch preserves this behaviour, correctly, I think.
> >>>>
> >>>> Where it causes problems is that it would now potentially call (and
> >>>> block in) myselect multiple times in a row without doing any other
> >>>> processing in between.  Which seems fine and good for the case we're
> >>>> trying to fix, but turns out to be bad during shutdown.
> >>>>
> >>>> During shutdown, we expect to get a bunch of SIGCHLDs for the exiting
> >>>> children, which are reaped in the outer loop, before the myselect() call
> >>>> (and after reaping, if there are no more children, we exit). But with
> >>>> pselect, SIGCHLD is blocked until pselect temporarily unblocks it, so
> >>>> these signals will always arrive during myselect().  This means that
> >>>> during shutdown, myselect() will legitimately return -1/EINTR a number
> >>>> of times, as the child processes exit, and we need to reap them and
> >>>> decrement counters in order to exit cleanly.
> >>>>
> >>>> With your patch, we no longer reap/decrement/exit on a signal unless
> >>>> we've also coincidentally exceeded 5 consecutive EINTRs, and have thus
> >>>> broken out of the myselect() loop -- which doesn't necessarily happen.
> >>>>
> >>>> Understanding the cause, I'm even more sure that my fix for the shutdown
> >>>> problem is correct: when we're in_shutdown, we /want/ any single EINTR
> >>>> to send us back around the outer loop so we can reap children and maybe
> >>>> exit.
> >>>>
> >>>>>> Maybe the answer is to break out of the myselect loop after one
> >>>>>> iteration if we're in_shutdown, so that we don't block indefinitely
> >>>>>> waiting for iterations that will never occur.  I think just changing the
> >>>>>> loop condition from 
> >>>>>>     while (r == -1)
> >>>>>> to
> >>>>>>     while (!in_shutdown && r == -1)
> >>>>>> might do the trick.  
> >>>>>
> >>>>> I'd instead set a non-null, but zero timeout, so select will still clean
> >>>>> up the signal mask and query for FDs. I pushed an alternative fixup on
> >>>>> GitHub:
> >>>>>
> >>>>> https://github.com/JensErat/cyrus-imapd/commit/bc3bacf70261711a9c810107cddb081df65a552c
> >>>>
> >>>> This is a much bigger change.  It inverts the logic ("wait indefinitely
> >>>> for something to do, unless schedule says to wake earlier" => "don't
> >>>> wait, unless schedule says how long to wait").  It also depends on there
> >>>> always being an upcoming scheduled event, otherwise the process will
> >>>> spin on the Services[] array and burn CPU for no reason.
> >>>>
> >>>>> Looking at the code, I'm wondering whether we need the test for
> >>>>> scheduled events at all. As I understand, the child janitor is always
> >>>>> registered as an event running all 10 seconds, so there should be no
> >>>>> branch for schedule == NULL. Anyway, testing probably won't hurt, either.
> >>>>
> >>>> The test for scheduled events is the difference between "block until a
> >>>> signal occurs or an fd becomes ready" and "block until a signal occurs,
> >>>> an fd becomes ready, or it is time to process a scheduled event".  If
> >>>> there's no scheduled events, we don't need to wake up in time for one,
> >>>> so we can just block until other work becomes available.
> >>>>
> >>>> The only time we want to not block is if we know there's a scheduled
> >>>> event needing immediate processing (due now, or overdue).
> >>>>
> >>>>> If I'm not mistaken, this actually also might have been an issue before
> >>>>> -- but don't ask me why it wasn't triggered. Probably because of good
> >>>>> luck and some interrupts (eg. the shutdown signal) being sufficient up
> >>>>> to now.
> >>>>
> >>>> This wasn't a problem before, because the exiting children would wake up
> >>>> myselect(), be processed and cleaned up at the top of the next iteration
> >>>> of the outer loop, and we would only call myselect() again if there were
> >>>> still children that we were waiting for (who would wake us up again when
> >>>> they finished).
> >>>>
> >>>> It's a problem with your patch because when we get a SIGCHLD we
> >>>> immediately myselect() again without cleaning up or checking whether
> >>>> there's even any more children to wait for.  (It can be moved along by
> >>>> the operator sending a few more signals, to manually exceed the 5 limit,
> >>>> but a clean shutdown shouldn't require this.)
> >>>>
> >>>> I think the original timeout behaviour is correct. I think the patch for
> >>>> dealing with the repeated interrupts looks good, except for the shutdown
> >>>> problem; and I think the right way to deal with the shutdown problem is
> >>>> by explicitly testing for that, and not looping myselect() in that case.
> >>>>
> >>>> Cheers,
> >>>>
> >>>> ellie
> >>>>
> >>>> On Fri, Apr 15, 2016, at 07:42 PM, Jens Erat wrote:
> >>>>> Hi Ellie,
> >>>>>
> >>>>>> It is failing our Cassandane tests, because the test suite is unable to
> >>>>>> cleanly shutdown the master process after each test finishes.
> >>>>>>
> >>>>>> Are you able to cleanly shut down the master process with a SIGTERM in
> >>>>>> your environment?  Can you cleanly shut one down that has very little
> >>>>>> activity on it?
> >>>>>
> >>>>> Indeed. We also had the idled-not-terminating-issue and still applied a
> >>>>> kill -9 after some time, so did not observe the issue yet.
> >>>>>
> >>>>>> I'm having a look at the way tvptr is set.  If the process is
> >>>>>> in_shutdown, then tvptr remains NULL (which tells myselect() to block
> >>>>>> indefinitely and never time out).  I'm not sure if this is going to turn
> >>>>>> out to be significant here.  I'm also not sure why we do this
> >>>>>> differently for this case.
> >>>>>>
> >>>>>> [...]
> >>>>>>
> >>>>>> What I think is happening is that, once shutdown is initiated, the only
> >>>>>> way myselect() can return is if one of the rfds becomes active, or if a
> >>>>>> signal arrives.  It will no longer time out.  But, once shutdown is
> >>>>>> initiated, at a certain point rfds are no longer becoming active
> >>>>>> (because all the children have themselves shut down), and signals are no
> >>>>>> longer occurring, so myselect() ends up blocked forever.
> >>>>>
> >>>>> I don't think we should block at all in this case. The master loop
> >>>>> should keep running and clean up. This is definitely a bug in my patch.
> >>>>>
> >>>>>
> >>>>>> Maybe the answer is to break out of the myselect loop after one
> >>>>>> iteration if we're in_shutdown, so that we don't block indefinitely
> >>>>>> waiting for iterations that will never occur.  I think just changing the
> >>>>>> loop condition from 
> >>>>>>     while (r == -1)
> >>>>>> to
> >>>>>>     while (!in_shutdown && r == -1)
> >>>>>> might do the trick.  
> >>>>>
> >>>>> I'd instead set a non-null, but zero timeout, so select will still clean
> >>>>> up the signal mask and query for FDs. I pushed an alternative fixup on
> >>>>> GitHub:
> >>>>>
> >>>>> https://github.com/JensErat/cyrus-imapd/commit/bc3bacf70261711a9c810107cddb081df65a552c
> >>>>>
> >>>>> Looking at the code, I'm wondering whether we need the test for
> >>>>> scheduled events at all. As I understand, the child janitor is always
> >>>>> registered as an event running all 10 seconds, so there should be no
> >>>>> branch for schedule == NULL. Anyway, testing probably won't hurt, either.
> >>>>>
> >>>>> Because the timeout is not only NULLed, but also set to zero as default,
> >>>>> we will not block any more iff
> >>>>>
> >>>>> - we're in shutdown or
> >>>>> - no schedule is set up (which should not happen, anyway, but we have a
> >>>>> safe fallback if the schedule is not completely set up/already torn down
> >>>>> on shutdown)
> >>>>>
> >>>>> If I'm not mistaken, this actually also might have been an issue before
> >>>>> -- but don't ask me why it wasn't triggered. Probably because of good
> >>>>> luck and some interrupts (eg. the shutdown signal) being sufficient up
> >>>>> to now.
> >>>>>
> >>>>>> This behaviour also suggests that a reasonable limit for interruptions
> >>>>>> might be one that scales with the number of child processes (rather than
> >>>>>> the arbitrary 5), but I don't think this alone would be enough to avoid
> >>>>>> the problem I just encountered.  If the remaining children all exit at
> >>>>>> the same time, and get processed by a single myselect(), then there's
> >>>>>> still n-1 loop iterations before shutdown would occur -- same problem.
> >>>>>
> >>>>> I'm not sure about this. This is the maximum number of SIGCHLDs, but we
> >>>>> also might fetch a SIGTERM on the go. Anyway: It is _very_ unlikely that
> >>>>> we get just another signal in-between the select calls. At least on our
> >>>>> setup, we never observed it at all, and this is a rather high-volume
> >>>>> service! Usually, the signals will arrive (but be blocked) somewhere
> >>>>> else in the master loop, and multiple signals are cleaned up with a
> >>>>> single (first) select statement.
> >>>>>
> >>>>> Anyway, nothing bad happens if we _sometimes_ pass over the message
> >>>>> handling, and at least we're now logging a warning after some retries.
> >>>>> If somebody has severe issues with this, he'll get aware of the reasons
> >>>>> by tweaking his configuration (for example by setting a more reasonable
> >>>>> -T parameter) or digging deeper.
> >>>>>
> >>>>> Regards,
> >>>>> Jens
> >>>>>
> >>>>> -- 
> >>>>> Jens Erat
> >>>>> Universität Konstanz
> >>>>> Kommunikations-, Infomations-, Medienzentrum (KIM)
> >>>>> Abteilung Basisdienste
> >>>>> D-78457 Konstanz
> >>>>> Mail: jens.erat at uni-konstanz.de
> >>>>>
> >>>>> Email had 1 attachment:
> >>>>> + smime.p7s
> >>>>>   7k (application/pkcs7-signature)
> >>>
> >>> -- 
> >>> Jens Erat
> >>> Universität Konstanz
> >>> Kommunikations-, Infomations-, Medienzentrum (KIM)
> >>> Abteilung Basisdienste
> >>> D-78457 Konstanz
> >>> Mail: jens.erat at uni-konstanz.de
> >>>
> >>> Email had 1 attachment:
> >>> + smime.p7s
> >>>   7k (application/pkcs7-signature)
> 
> -- 
> Jens Erat
> Universität Konstanz
> Kommunikations-, Infomations-, Medienzentrum (KIM)
> Abteilung Basisdienste
> D-78457 Konstanz
> Mail: jens.erat at uni-konstanz.de
> 
> Email had 1 attachment:
> + smime.p7s
>   7k (application/pkcs7-signature)