Severe issue with high-volume IMAP server stalling, possible patch

Tue Apr 12 05:43:53 EDT 2016

Hi Ellie,

find the patch attached (developed against Cyrus IMAP 2.5.7 tarball
release).

I've additionally removed the unnecessary bookkeeping for our debug
output and lowered the maximum retries to a less arbitrary 5 (which
seems still pretty arbitrary). Only retrying once might at least reduce
response times for some weird setups, we never experienced more than a
single retry. This might still be subject of discussion and seems
something to be moved in a constant and/or header file?

Regards,
Jens

Am 12.04.2016 um 01:32 schrieb ellie timoney via Cyrus-devel:
> Hi Jens,
> 
>> after some weeks of running a stable service with this patch, I'd like
>> to propose merging this (we'd really like to switch back to an unpatched
>> Cyrus release in future).
> 
> This would be good.  Can you attach it as a patch file please?  CC me
> and I'll get it merged.
> 
> Thanks for your hard work figuring this out. :)
> 
> Cheers,
> 
> ellie
> 
> On Mon, Apr 11, 2016, at 05:52 PM, Jens Erat via Cyrus-devel wrote:
>> Hi all,
>>
>> after some weeks of running a stable service with this patch, I'd like
>> to propose merging this (we'd really like to switch back to an unpatched
>> Cyrus release in future). This is not only an issue to us, but we know
>> some other universities running similar Solaris setups and waiting to
>> upgrade to Cyrus 2.5 because of our issues.
>>
>> This is only an issue with Cyrus IMAP 2.5 and probably 3.0; Cyrus 2.4
>> also does not properly repeat select if it failed, but it's not an issue
>> here as interrupts can occur anywhere.
>>
>> We didn't have those master loop hick ups any more. In fact, the while
>> loop even seems superfluous; just running myselect again if `r==0` would
>> be fine, but this might be different with other combinations of
>> operating system kernels, C libraries and compilers, and the loop seems
>> the safer thing to do. The limit of 100 retries seems a little bit
>> arbitrary, but I can't really argument for any number, something around
>> five should probably also be fine to cover all scenarios that are not
>> completely broken, anyway.
>>
>> We dumped the interrupted and again counters each time the master loop
>> ran (and several other metrics), and never had more than one EINTR or
>> EAGAIN in a row (so the next `myselect` immediately returned the FDSET).
>>
>> Kind regards from Lake Constance, Germany,
>> Jens
>>
>> Am 10.02.2016 um 13:39 schrieb Jens Erat via Cyrus-devel:
>>> Hi Bron,
>>>
>>> digging deeper, we realized there are still issues with the loop you
>>> proposed. `select` will only decrement tvptr on Linux (and maybe _some_
>>> other systems, definitely not all of them), pselect will never do -- you
>>> might end up waiting forever here if nothing happens.
>>>
>>> On the other hand, we partially got a grasp of understanding what was
>>> going on. Because when `pselect` is available, signalling was disabled
>>> but during `pselect`, thus with some weird combination of load on the
>>> different services, each `pselect` returned with `EINTR` and the message
>>> handling code was never reached
>>>
>>> We ended up putting the whole deadline calculation in a queue, and have
>>> this code live at the moment:
>>>
>>>         int interrupted = 0;
>>>         int again = 0;
>>>         do {
>>>             /* how long to wait? - do now so that any scheduled wakeup
>>>              * calls get accounted for*/
>>>             gettimeofday(&now, 0);
>>>             tvptr = NULL;
>>>             if (schedule && !in_shutdown) {
>>>                 double delay = timesub(&now, &schedule->mark);
>>>                 if (!interrupted && delay > 0.0) {
>>>                     timeval_set_double(&tv, delay);
>>>                 }
>>>                 else {
>>>                     tv.tv_sec = 0;
>>>                     tv.tv_usec = 0;
>>>                 }
>>>                 tvptr = &tv;
>>>             }
>>>
>>> #if defined(HAVE_UCDSNMP) || defined(HAVE_NETSNMP)
>>>             if (tvptr == NULL) blockp = 1;
>>>             snmp_select_info(&maxfd, &rfds, tvptr, &blockp);
>>> #endif
>>>             errno = 0;
>>>             r = myselect(maxfd, &rfds, NULL, NULL, tvptr);
>>>
>>>             if (r == -1) switch(errno) {
>>>             /* Try again to get valid rfds, this time without blocking so we
>>>              * will definitely process messages without getting interrupted
>>>              * again. */
>>>             case EINTR:
>>>                 interrupted++;
>>>                 if (interrupted > 100) {
>>>                     syslog(LOG_WARNING, "Repeatedly interrupted, too
>>> many signals?");
>>>                     /* Fake a timeout */
>>>                     r = 0;
>>>                     FD_ZERO(&rfds);
>>>                 }
>>>                 break;
>>>             /* Try again. */
>>>             case EAGAIN: again++; break;
>>>             /* uh oh */
>>>             default: fatalf(1, "select failed: %m");
>>>             }
>>>         } while (r == -1);
>>>
>>> We also dump interrupted and again together with some other variables at
>>> the end of each loop. The server is running fine for two days now, and
>>> the dumped information looks fine (we never experienced any delays >4
>>> seconds since patching the server, we often found the master loop not
>>> completely running for minutes). I will report again if we didn't suffer
>>> from any interruptions for the next one or two weeks.
>>>
>>> Some remarks to the code:
>>>
>>> - We need to update the current timestamp `now` for decrementing the
>>> passed time.
>>> - `snmp_select_info` or's its FDs to rfds; thus it may be run repeatedly
>>> to update `tvptr` and not run over any SNMP events.
>>> - We are compiling without SNMP, thus this code is untested.
>>> - If interrupted, we try again to receive FDs, but without waiting
>>> (tvptr set to zero). Because of this, no large interruptions should
>>> occur -- but we also won't skip over message handling when flooded with
>>> signals.
>>> - After 100 retries after being interrupted, we empty rfds and continue
>>> to work, but print a warning. This is probably subject of discussion and
>>> _much_ too large: we're currenlty dumping the number of interruptions,
>>> and never had more than one at all before catching file descriptors (or
>>> none).
>>> - `int again` can and should be removed for live code, does not seem
>>> relevant at all (but for our own debug purpose).
>>>
>>> Kind regards from Lake Constance, Germany,
>>> Jens
>>>
>>>
>>> Am 06.02.2016 um 13:00 schrieb Bron Gondwana:
>>>> I could buy repeating the whole thing for EINTR but not for EAGAIN.  Obviously everything is fine under non-heavy load no matter what.
>>>>
>>>> I'm pretty sure your patch is wrong, because it will process the fdset that the select was supposed to fill even if the select returned an error.
>>>>
>>>> You may have convinced me that my patch is wrong as well, and that we have to always check for the in_shutdown and sigquit magic in that tight loop in case it was a signal that needs to be responded to.
>>>>
>>>> Bron.
>>>>
>>>> On Sat, Feb 6, 2016, at 20:52, Jens Erat wrote:
>>>>> Hi Bron,
>>>>>
>>>>> thank you for the fast reply.
>>>>>
>>>>>> One thing I would suggest is running something like nginx in front of
>>>>> Cyrus, even if you only need it for authentication checks.
>>>>>
>>>>> We'll discuss this internally.
>>>>>
>>>>>> I believe this is the correct patch. If the select was interrupted
>>>>> then it needs to be re-run, but there is no need to repeat all the other
>>>>> work, so:
>>>>>>
>>>>>> - r = myselect(maxfd, &rfds, NULL, NULL, tvptr);
>>>>>> - if (r == -1 && errno == EAGAIN) continue;
>>>>>> - if (r == -1 && errno == EINTR) continue;
>>>>>> + do {
>>>>>> + r = myselect(maxfd, &rfds, NULL, NULL, tvptr);
>>>>>> + } while (r == -1 && (errno == EAGAIN || errno == EINTR));
>>>
>>
>> -- 
>> Jens Erat
>> Universität Konstanz
>> Kommunikations-, Infomations-, Medienzentrum (KIM)
>> Abteilung Basisdienste
>> D-78457 Konstanz
>> Mail: jens.erat at uni-konstanz.de
>>
>> Email had 1 attachment:
>> + smime.p7s
>>   7k (application/pkcs7-signature)

-- 
Jens Erat
Universität Konstanz
Kommunikations-, Infomations-, Medienzentrum (KIM)
Abteilung Basisdienste
D-78457 Konstanz
Mail: jens.erat at uni-konstanz.de
-------------- next part --------------
A non-text attachment was scrubbed...
Name: master.c-EINTR.patch
Type: text/x-patch
Size: 1980 bytes
Desc: not available
URL: <http://lists.andrew.cmu.edu/pipermail/cyrus-devel/attachments/20160412/ec433929/attachment-0001.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4913 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.andrew.cmu.edu/pipermail/cyrus-devel/attachments/20160412/ec433929/attachment-0001.p7s>