Severe issue with high-volume IMAP server stalling, possible patch

Mon Apr 11 19:32:29 EDT 2016

Hi Jens,

> after some weeks of running a stable service with this patch, I'd like
> to propose merging this (we'd really like to switch back to an unpatched
> Cyrus release in future).

This would be good.  Can you attach it as a patch file please?  CC me
and I'll get it merged.

Thanks for your hard work figuring this out. :)

Cheers,

ellie

On Mon, Apr 11, 2016, at 05:52 PM, Jens Erat via Cyrus-devel wrote:
> Hi all,
> 
> after some weeks of running a stable service with this patch, I'd like
> to propose merging this (we'd really like to switch back to an unpatched
> Cyrus release in future). This is not only an issue to us, but we know
> some other universities running similar Solaris setups and waiting to
> upgrade to Cyrus 2.5 because of our issues.
> 
> This is only an issue with Cyrus IMAP 2.5 and probably 3.0; Cyrus 2.4
> also does not properly repeat select if it failed, but it's not an issue
> here as interrupts can occur anywhere.
> 
> We didn't have those master loop hick ups any more. In fact, the while
> loop even seems superfluous; just running myselect again if `r==0` would
> be fine, but this might be different with other combinations of
> operating system kernels, C libraries and compilers, and the loop seems
> the safer thing to do. The limit of 100 retries seems a little bit
> arbitrary, but I can't really argument for any number, something around
> five should probably also be fine to cover all scenarios that are not
> completely broken, anyway.
> 
> We dumped the interrupted and again counters each time the master loop
> ran (and several other metrics), and never had more than one EINTR or
> EAGAIN in a row (so the next `myselect` immediately returned the FDSET).
> 
> Kind regards from Lake Constance, Germany,
> Jens
> 
> Am 10.02.2016 um 13:39 schrieb Jens Erat via Cyrus-devel:
> > Hi Bron,
> > 
> > digging deeper, we realized there are still issues with the loop you
> > proposed. `select` will only decrement tvptr on Linux (and maybe _some_
> > other systems, definitely not all of them), pselect will never do -- you
> > might end up waiting forever here if nothing happens.
> > 
> > On the other hand, we partially got a grasp of understanding what was
> > going on. Because when `pselect` is available, signalling was disabled
> > but during `pselect`, thus with some weird combination of load on the
> > different services, each `pselect` returned with `EINTR` and the message
> > handling code was never reached
> > 
> > We ended up putting the whole deadline calculation in a queue, and have
> > this code live at the moment:
> > 
> >         int interrupted = 0;
> >         int again = 0;
> >         do {
> >             /* how long to wait? - do now so that any scheduled wakeup
> >              * calls get accounted for*/
> >             gettimeofday(&now, 0);
> >             tvptr = NULL;
> >             if (schedule && !in_shutdown) {
> >                 double delay = timesub(&now, &schedule->mark);
> >                 if (!interrupted && delay > 0.0) {
> >                     timeval_set_double(&tv, delay);
> >                 }
> >                 else {
> >                     tv.tv_sec = 0;
> >                     tv.tv_usec = 0;
> >                 }
> >                 tvptr = &tv;
> >             }
> > 
> > #if defined(HAVE_UCDSNMP) || defined(HAVE_NETSNMP)
> >             if (tvptr == NULL) blockp = 1;
> >             snmp_select_info(&maxfd, &rfds, tvptr, &blockp);
> > #endif
> >             errno = 0;
> >             r = myselect(maxfd, &rfds, NULL, NULL, tvptr);
> > 
> >             if (r == -1) switch(errno) {
> >             /* Try again to get valid rfds, this time without blocking so we
> >              * will definitely process messages without getting interrupted
> >              * again. */
> >             case EINTR:
> >                 interrupted++;
> >                 if (interrupted > 100) {
> >                     syslog(LOG_WARNING, "Repeatedly interrupted, too
> > many signals?");
> >                     /* Fake a timeout */
> >                     r = 0;
> >                     FD_ZERO(&rfds);
> >                 }
> >                 break;
> >             /* Try again. */
> >             case EAGAIN: again++; break;
> >             /* uh oh */
> >             default: fatalf(1, "select failed: %m");
> >             }
> >         } while (r == -1);
> > 
> > We also dump interrupted and again together with some other variables at
> > the end of each loop. The server is running fine for two days now, and
> > the dumped information looks fine (we never experienced any delays >4
> > seconds since patching the server, we often found the master loop not
> > completely running for minutes). I will report again if we didn't suffer
> > from any interruptions for the next one or two weeks.
> > 
> > Some remarks to the code:
> > 
> > - We need to update the current timestamp `now` for decrementing the
> > passed time.
> > - `snmp_select_info` or's its FDs to rfds; thus it may be run repeatedly
> > to update `tvptr` and not run over any SNMP events.
> > - We are compiling without SNMP, thus this code is untested.
> > - If interrupted, we try again to receive FDs, but without waiting
> > (tvptr set to zero). Because of this, no large interruptions should
> > occur -- but we also won't skip over message handling when flooded with
> > signals.
> > - After 100 retries after being interrupted, we empty rfds and continue
> > to work, but print a warning. This is probably subject of discussion and
> > _much_ too large: we're currenlty dumping the number of interruptions,
> > and never had more than one at all before catching file descriptors (or
> > none).
> > - `int again` can and should be removed for live code, does not seem
> > relevant at all (but for our own debug purpose).
> > 
> > Kind regards from Lake Constance, Germany,
> > Jens
> > 
> > 
> > Am 06.02.2016 um 13:00 schrieb Bron Gondwana:
> >> I could buy repeating the whole thing for EINTR but not for EAGAIN.  Obviously everything is fine under non-heavy load no matter what.
> >>
> >> I'm pretty sure your patch is wrong, because it will process the fdset that the select was supposed to fill even if the select returned an error.
> >>
> >> You may have convinced me that my patch is wrong as well, and that we have to always check for the in_shutdown and sigquit magic in that tight loop in case it was a signal that needs to be responded to.
> >>
> >> Bron.
> >>
> >> On Sat, Feb 6, 2016, at 20:52, Jens Erat wrote:
> >>> Hi Bron,
> >>>
> >>> thank you for the fast reply.
> >>>
> >>>> One thing I would suggest is running something like nginx in front of
> >>> Cyrus, even if you only need it for authentication checks.
> >>>
> >>> We'll discuss this internally.
> >>>
> >>>> I believe this is the correct patch. If the select was interrupted
> >>> then it needs to be re-run, but there is no need to repeat all the other
> >>> work, so:
> >>>>
> >>>> - r = myselect(maxfd, &rfds, NULL, NULL, tvptr);
> >>>> - if (r == -1 && errno == EAGAIN) continue;
> >>>> - if (r == -1 && errno == EINTR) continue;
> >>>> + do {
> >>>> + r = myselect(maxfd, &rfds, NULL, NULL, tvptr);
> >>>> + } while (r == -1 && (errno == EAGAIN || errno == EINTR));
> > 
> 
> -- 
> Jens Erat
> Universität Konstanz
> Kommunikations-, Infomations-, Medienzentrum (KIM)
> Abteilung Basisdienste
> D-78457 Konstanz
> Mail: jens.erat at uni-konstanz.de
> 
> Email had 1 attachment:
> + smime.p7s
>   7k (application/pkcs7-signature)