Severe issue with high-volume IMAP server stalling, possible patch

Wed Feb 10 07:39:21 EST 2016

Hi Bron,

digging deeper, we realized there are still issues with the loop you
proposed. `select` will only decrement tvptr on Linux (and maybe _some_
other systems, definitely not all of them), pselect will never do -- you
might end up waiting forever here if nothing happens.

On the other hand, we partially got a grasp of understanding what was
going on. Because when `pselect` is available, signalling was disabled
but during `pselect`, thus with some weird combination of load on the
different services, each `pselect` returned with `EINTR` and the message
handling code was never reached

We ended up putting the whole deadline calculation in a queue, and have
this code live at the moment:

        int interrupted = 0;
        int again = 0;
        do {
            /* how long to wait? - do now so that any scheduled wakeup
             * calls get accounted for*/
            gettimeofday(&now, 0);
            tvptr = NULL;
            if (schedule && !in_shutdown) {
                double delay = timesub(&now, &schedule->mark);
                if (!interrupted && delay > 0.0) {
                    timeval_set_double(&tv, delay);
                }
                else {
                    tv.tv_sec = 0;
                    tv.tv_usec = 0;
                }
                tvptr = &tv;
            }

#if defined(HAVE_UCDSNMP) || defined(HAVE_NETSNMP)
            if (tvptr == NULL) blockp = 1;
            snmp_select_info(&maxfd, &rfds, tvptr, &blockp);
#endif
            errno = 0;
            r = myselect(maxfd, &rfds, NULL, NULL, tvptr);

            if (r == -1) switch(errno) {
            /* Try again to get valid rfds, this time without blocking so we
             * will definitely process messages without getting interrupted
             * again. */
            case EINTR:
                interrupted++;
                if (interrupted > 100) {
                    syslog(LOG_WARNING, "Repeatedly interrupted, too
many signals?");
                    /* Fake a timeout */
                    r = 0;
                    FD_ZERO(&rfds);
                }
                break;
            /* Try again. */
            case EAGAIN: again++; break;
            /* uh oh */
            default: fatalf(1, "select failed: %m");
            }
        } while (r == -1);

We also dump interrupted and again together with some other variables at
the end of each loop. The server is running fine for two days now, and
the dumped information looks fine (we never experienced any delays >4
seconds since patching the server, we often found the master loop not
completely running for minutes). I will report again if we didn't suffer
from any interruptions for the next one or two weeks.

Some remarks to the code:

- We need to update the current timestamp `now` for decrementing the
passed time.
- `snmp_select_info` or's its FDs to rfds; thus it may be run repeatedly
to update `tvptr` and not run over any SNMP events.
- We are compiling without SNMP, thus this code is untested.
- If interrupted, we try again to receive FDs, but without waiting
(tvptr set to zero). Because of this, no large interruptions should
occur -- but we also won't skip over message handling when flooded with
signals.
- After 100 retries after being interrupted, we empty rfds and continue
to work, but print a warning. This is probably subject of discussion and
_much_ too large: we're currenlty dumping the number of interruptions,
and never had more than one at all before catching file descriptors (or
none).
- `int again` can and should be removed for live code, does not seem
relevant at all (but for our own debug purpose).

Kind regards from Lake Constance, Germany,
Jens

Am 06.02.2016 um 13:00 schrieb Bron Gondwana:
> I could buy repeating the whole thing for EINTR but not for EAGAIN.  Obviously everything is fine under non-heavy load no matter what.
> 
> I'm pretty sure your patch is wrong, because it will process the fdset that the select was supposed to fill even if the select returned an error.
> 
> You may have convinced me that my patch is wrong as well, and that we have to always check for the in_shutdown and sigquit magic in that tight loop in case it was a signal that needs to be responded to.
> 
> Bron.
> 
> On Sat, Feb 6, 2016, at 20:52, Jens Erat wrote:
>> Hi Bron,
>>
>> thank you for the fast reply.
>>
>>> One thing I would suggest is running something like nginx in front of
>> Cyrus, even if you only need it for authentication checks.
>>
>> We'll discuss this internally.
>>
>>> I believe this is the correct patch. If the select was interrupted
>> then it needs to be re-run, but there is no need to repeat all the other
>> work, so:
>>>
>>> - r = myselect(maxfd, &rfds, NULL, NULL, tvptr);
>>> - if (r == -1 && errno == EAGAIN) continue;
>>> - if (r == -1 && errno == EINTR) continue;
>>> + do {
>>> + r = myselect(maxfd, &rfds, NULL, NULL, tvptr);
>>> + } while (r == -1 && (errno == EAGAIN || errno == EINTR));

-- 
Jens Erat
Universität Konstanz
Kommunikations-, Infomations-, Medienzentrum (KIM)
Abteilung Basisdienste
D-78457 Konstanz
Mail: jens.erat at uni-konstanz.de

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4913 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.andrew.cmu.edu/pipermail/cyrus-devel/attachments/20160210/00192ea9/attachment.p7s>