cyrus master fails with status 71
Eric Cunningham
eric at whoi.edu
Mon Nov 7 13:01:29 EST 2016
Hi Ellie, we've been running with your patch since Oct 25 and haven't
encountered any issues with imapd exiting, thus far. But, now that
imapd has had a chance to run uninterrupted for almost 2 weeks, the
number of imapd processes/connections has steadily climbed every day.
This morning, it was near 16,000. This system has a total of 1400 accounts.
To try and control this growth, per
https://cyrusimap.org/imap/faqs/o-toomanyprocesses.html I've set the
following:
To cyrus.conf, added "-U 50" option to the SERVICES section for imapd:
imap cmd="imapd -U 50" listen="imap" prefork=60
imaps cmd="imapd -s -U 50" listen="imaps" prefork=150
To imapd.conf, added the following tcp_keepalive options:
tcp_keepalive: 1
tcp_keepalive_cnt: 1
tcp_keepalive_idle: 30
tcp_keepalive_intvl: 900
After restarting imapd, the following are now being logged repeatedly:
Nov 7 10:18:19 imap1 lmtpunix[58768]: unable to
setsocketopt(TCP_KEEPCNT): Invalid argument
Nov 7 10:18:19 imap1 lmtpunix[58768]: unable to
setsocketopt(TCP_KEEPIDLE): Invalid argument
Nov 7 10:18:19 imap1 lmtpunix[58768]: unable to
setsocketopt(TCP_KEEPINTVL): Invalid argument
So, a couple of questions for the list:
Are such numbers of imapd processes to be expected?
Why is lmptunix complaining about options passed to imapd?
Thank you.
-Eric
On 10/25/16 8:23 PM, ellie timoney via Info-cyrus wrote:
> Hi Eric,
>
> Patch attached. I'd appreciate if you could advise whether this helps.
> Though I guess you won't be able to tell for a couple of weeks.
>
> If it doesn't cause any new problems (I don't expect it to), then it
> will be included in 2.5.11 (whenever that comes out).
>
> Cheers,
>
> ellie
>
> On Wed, Oct 26, 2016, at 10:04 AM, ellie timoney via Info-cyrus wrote:
>>> accept failed: Software caused connection abort
>>
>> Some sleuthing suggests that "Software caused connection abort"
>> corresponds with "ECONNABORTED".
>>
>> The man page on my system for accept(2) unhelpfully defines this as:
>>
>>> ECONNABORTED
>>> A connection has been aborted.
>>
>> But some digging around online suggests that this situation occurs when
>> a client connects, but subsequently disconnects (RST) before the server
>> gets around to accept()ing the connection. When the server does
>> eventually accept(), the accept() fails with this error.
>>
>> Which sounds to me like we want to treat ECONNABORTED similarly to
>> EAGAIN, not as a fatal OS error. I'll have a patch up for this shortly.
>>
>> Cheers,
>>
>> ellie
>>
>> On Wed, Oct 26, 2016, at 09:27 AM, Eric Cunningham via Info-cyrus wrote:
>>> Having repeatedly experienced the "status 71" issue, I've been
>>> incrementally bumping it's value up. It's currently set to 32768 (!)
>>> and that value was in place when it most recently failed.
>>>
>>>
>>> On 10/25/16 4:21 PM, Shawn Bakhtiar via Info-cyrus wrote:
>>>> Hmmmm.. if that’s the case could you be hitting the the maximum number
>>>> of accepts??
>>>>
>>>> Check the 11.11.1.2. kern.ipc.soacceptqueue section of the FreeBSD handbook
>>>>
>>>> https://www.freebsd.org/doc/handbook/configtuning-kernel-limits.html
>>>>
>>>> Given the load you described perhaps 128 is just not enough?
>>>>
>>>>
>>>>
>>>>> On Oct 24, 2016, at 1:22 PM, Eric Cunningham via Info-cyrus
>>>>> <info-cyrus at lists.andrew.cmu.edu
>>>>> <mailto:info-cyrus at lists.andrew.cmu.edu>> wrote:
>>>>>
>>>>>
>>>>>
>>>>> =============================================================
>>>>> Eric Cunningham
>>>>> Information Services - http://whoi-it.whoi.edu
>>>>> Woods Hole Oceanographic Institution - http://www.whoi.edu
>>>>> Woods Hole, MA 02543-1541 phone: (508) 289-2224
>>>>> fax: (508) 457-2174 e-mail: ecunningham at whoi.edu
>>>>> <mailto:ecunningham at whoi.edu>
>>>>> =============================================================
>>>>>
>>>>> On 10/24/2016 03:45 PM, Bron Gondwana via Info-cyrus wrote:
>>>>>> On Tue, 25 Oct 2016, at 02:45, Eric Cunningham via Info-cyrus wrote:
>>>>>>> Hi list, we're running cyrus imap 2.5.9 built from the FreeBSD 10-2
>>>>>>> (release-p7) ports tree.
>>>>>>>
>>>>>>> The cyrus master process is failing periodically (every 1-2 weeks) as
>>>>>>> follows:
>>>>>>>
>>>>>>> Oct 22 07:38:48 imap1 master[7767]: process type:SERVICE name:imaps
>>>>>>> path:/usr/local/cyrus/bin/imapd age:305.215s pid:32760 exited, status 71
>>>>>>> Oct 22 07:38:48 imap1 master[7767]: service imaps/ipv4 pid 32760 in
>>>>>>> READY state: terminated abnormally
>>>>>>> Oct 22 07:38:48 imap1 master[7767]: too many failures for service
>>>>>>> imaps/ipv4, disabling until next SIGHUP
>>>>>>>
>>>>>>> This prevents new connections by clients until cyrus is restarted. I've
>>>>>>> looked around the web but have not seen this issue reported.
>>>>>>>
>>>>>>> A little background:
>>>>>>>
>>>>>>> Our initial thought on this was that we were running out of listen
>>>>>>> queues so have upped that incrementally from the default of 32 to a
>>>>>>> current setting of 32768 via /usr/local/etc/rc.d/imapd using the -l
>>>>>>> option, with increased kern.ipc.soacceptqueue set to 32768, but that
>>>>>>> hasn't helped. Sometimes the "status 71" occurs during periods of light
>>>>>>> use during off hours, like on Saturday mornings.
>>>>>>>
>>>>>>> We have ~1400 imap accounts, though the number of impad processes hovers
>>>>>>> around 3,000-4,000. There have been spikes observed as high as 12,000
>>>>>>> imapd processes. In that particular case, 1 user had 2 imap clients
>>>>>>> accounting for near 6,000 of those connections. We've attempted to
>>>>>>> limit these high numbers using the following imapd.conf values:
>>>>>>>
>>>>>>> maxlogins_per_host: 50
>>>>>>> maxlogins_per_user: 30
>>>>>>> tcp_keepalive: 1
>>>>>>> tcp_keepalive_cnt: 1
>>>>>>> tcp_keepalive_idle: 30
>>>>>>> tcp_keepalive_intvl: 900
>>>>>>>
>>>>>>> However, it seems that once these were reached, no new connections were
>>>>>>> permitted and resulted in all manner of user complaints about not being
>>>>>>> able to get at their email.
>>>>>>>
>>>>>>> Any ideas on this "status 71" issue? Could an upgrade to 2.5.10
>>>>>>> possibly address this? Thanks!
>>>>>>
>>>>>> https://www.freebsd.org/cgi/man.cgi?query=sysexits
>>>>>>
>>>>>> EX_OSERR (71) An operating system error has been
>>>>>> detected. This
>>>>>> is intended to be used for such things as
>>>>>> ``cannot
>>>>>> fork'', ``cannot create pipe'', or the
>>>>>> like. It
>>>>>> includes things like getuid returning a
>>>>>> user that
>>>>>> does not exist in the passwd file.
>>>>>>
>>>>>> So the question is: what failed? Is there anything earlier in the
>>>>>> log to suggest
>>>>>> what the imapd was doing when it died?
>>>>>>
>>>>>> Bron.
>>>>>>
>>>>>
>>>>> Using the example I posted, I traced back imaps process id 32760 and
>>>>> found only this:
>>>>>
>>>>> Oct 22 07:38:48 imap1 imaps[32760]: accept failed: Software caused
>>>>> connection abort
>>>>>
>>>>> -Eric
>>>>>
>>>>> ----
>>>>> Cyrus Home Page: http://www.cyrusimap.org/
>>>>> List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
>>>>> To Unsubscribe:
>>>>> https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus
>>>>
>>>>
>>>>
>>>> ----
>>>> Cyrus Home Page: http://www.cyrusimap.org/
>>>> List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
>>>> To Unsubscribe:
>>>> https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus
>>>>
>>>
>>> ----
>>> Cyrus Home Page: http://www.cyrusimap.org/
>>> List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
>>> To Unsubscribe:
>>> https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus
>> ----
>> Cyrus Home Page: http://www.cyrusimap.org/
>> List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
>> To Unsubscribe:
>> https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus
>>
>>
>> ----
>> Cyrus Home Page: http://www.cyrusimap.org/
>> List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
>> To Unsubscribe:
>> https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus
More information about the Info-cyrus
mailing list