Severe issue with high-volume IMAP server stalling, possible patch

Mon Apr 11 03:52:54 EDT 2016

Hi all,

after some weeks of running a stable service with this patch, I'd like
to propose merging this (we'd really like to switch back to an unpatched
Cyrus release in future). This is not only an issue to us, but we know
some other universities running similar Solaris setups and waiting to
upgrade to Cyrus 2.5 because of our issues.

This is only an issue with Cyrus IMAP 2.5 and probably 3.0; Cyrus 2.4
also does not properly repeat select if it failed, but it's not an issue
here as interrupts can occur anywhere.

We didn't have those master loop hick ups any more. In fact, the while
loop even seems superfluous; just running myselect again if `r==0` would
be fine, but this might be different with other combinations of
operating system kernels, C libraries and compilers, and the loop seems
the safer thing to do. The limit of 100 retries seems a little bit
arbitrary, but I can't really argument for any number, something around
five should probably also be fine to cover all scenarios that are not
completely broken, anyway.

We dumped the interrupted and again counters each time the master loop
ran (and several other metrics), and never had more than one EINTR or
EAGAIN in a row (so the next `myselect` immediately returned the FDSET).

Kind regards from Lake Constance, Germany,
Jens

Am 10.02.2016 um 13:39 schrieb Jens Erat via Cyrus-devel:
> Hi Bron,
> 
> digging deeper, we realized there are still issues with the loop you
> proposed. `select` will only decrement tvptr on Linux (and maybe _some_
> other systems, definitely not all of them), pselect will never do -- you
> might end up waiting forever here if nothing happens.
> 
> On the other hand, we partially got a grasp of understanding what was
> going on. Because when `pselect` is available, signalling was disabled
> but during `pselect`, thus with some weird combination of load on the
> different services, each `pselect` returned with `EINTR` and the message
> handling code was never reached
> 
> We ended up putting the whole deadline calculation in a queue, and have
> this code live at the moment:
> 
>         int interrupted = 0;
>         int again = 0;
>         do {
>             /* how long to wait? - do now so that any scheduled wakeup
>              * calls get accounted for*/
>             gettimeofday(&now, 0);
>             tvptr = NULL;
>             if (schedule && !in_shutdown) {
>                 double delay = timesub(&now, &schedule->mark);
>                 if (!interrupted && delay > 0.0) {
>                     timeval_set_double(&tv, delay);
>                 }
>                 else {
>                     tv.tv_sec = 0;
>                     tv.tv_usec = 0;
>                 }
>                 tvptr = &tv;
>             }
> 
> #if defined(HAVE_UCDSNMP) || defined(HAVE_NETSNMP)
>             if (tvptr == NULL) blockp = 1;
>             snmp_select_info(&maxfd, &rfds, tvptr, &blockp);
> #endif
>             errno = 0;
>             r = myselect(maxfd, &rfds, NULL, NULL, tvptr);
> 
>             if (r == -1) switch(errno) {
>             /* Try again to get valid rfds, this time without blocking so we
>              * will definitely process messages without getting interrupted
>              * again. */
>             case EINTR:
>                 interrupted++;
>                 if (interrupted > 100) {
>                     syslog(LOG_WARNING, "Repeatedly interrupted, too
> many signals?");
>                     /* Fake a timeout */
>                     r = 0;
>                     FD_ZERO(&rfds);
>                 }
>                 break;
>             /* Try again. */
>             case EAGAIN: again++; break;
>             /* uh oh */
>             default: fatalf(1, "select failed: %m");
>             }
>         } while (r == -1);
> 
> We also dump interrupted and again together with some other variables at
> the end of each loop. The server is running fine for two days now, and
> the dumped information looks fine (we never experienced any delays >4
> seconds since patching the server, we often found the master loop not
> completely running for minutes). I will report again if we didn't suffer
> from any interruptions for the next one or two weeks.
> 
> Some remarks to the code:
> 
> - We need to update the current timestamp `now` for decrementing the
> passed time.
> - `snmp_select_info` or's its FDs to rfds; thus it may be run repeatedly
> to update `tvptr` and not run over any SNMP events.
> - We are compiling without SNMP, thus this code is untested.
> - If interrupted, we try again to receive FDs, but without waiting
> (tvptr set to zero). Because of this, no large interruptions should
> occur -- but we also won't skip over message handling when flooded with
> signals.
> - After 100 retries after being interrupted, we empty rfds and continue
> to work, but print a warning. This is probably subject of discussion and
> _much_ too large: we're currenlty dumping the number of interruptions,
> and never had more than one at all before catching file descriptors (or
> none).
> - `int again` can and should be removed for live code, does not seem
> relevant at all (but for our own debug purpose).
> 
> Kind regards from Lake Constance, Germany,
> Jens
> 
> 
> Am 06.02.2016 um 13:00 schrieb Bron Gondwana:
>> I could buy repeating the whole thing for EINTR but not for EAGAIN.  Obviously everything is fine under non-heavy load no matter what.
>>
>> I'm pretty sure your patch is wrong, because it will process the fdset that the select was supposed to fill even if the select returned an error.
>>
>> You may have convinced me that my patch is wrong as well, and that we have to always check for the in_shutdown and sigquit magic in that tight loop in case it was a signal that needs to be responded to.
>>
>> Bron.
>>
>> On Sat, Feb 6, 2016, at 20:52, Jens Erat wrote:
>>> Hi Bron,
>>>
>>> thank you for the fast reply.
>>>
>>>> One thing I would suggest is running something like nginx in front of
>>> Cyrus, even if you only need it for authentication checks.
>>>
>>> We'll discuss this internally.
>>>
>>>> I believe this is the correct patch. If the select was interrupted
>>> then it needs to be re-run, but there is no need to repeat all the other
>>> work, so:
>>>>
>>>> - r = myselect(maxfd, &rfds, NULL, NULL, tvptr);
>>>> - if (r == -1 && errno == EAGAIN) continue;
>>>> - if (r == -1 && errno == EINTR) continue;
>>>> + do {
>>>> + r = myselect(maxfd, &rfds, NULL, NULL, tvptr);
>>>> + } while (r == -1 && (errno == EAGAIN || errno == EINTR));
> 

-- 
Jens Erat
Universität Konstanz
Kommunikations-, Infomations-, Medienzentrum (KIM)
Abteilung Basisdienste
D-78457 Konstanz
Mail: jens.erat at uni-konstanz.de

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4913 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.andrew.cmu.edu/pipermail/cyrus-devel/attachments/20160411/5a2fb0d5/attachment.p7s>