[RFC PATCH v2] imapd.c: imapoptions: implement idle timeout

Andy Dorman adorman at ironicdesign.com
Tue Sep 20 14:20:29 EDT 2016

On 09/20/2016 08:14 AM, Andy Dorman wrote:
> On 09/19/2016 09:02 PM, ellie timoney via Cyrus-devel wrote:
>> I've been looking at tcp_keepalive a bit lately and I'm wondering how it
>> interacts with this?
>> It's my understanding that, in most cases, tcp_keepalive will do the job
>> of detecting clients that have dropped out, and allow us to close the
>> connection on our end.  Since we're generally either waiting for a
>> command from the client, or producing and about to send output to the
>> client, this works -- because if tcp_keepalive detects that the client
>> isn't there, reads and writes to the socket will start failing.
>> But during the IDLE state, we only read from the client socket if select
>> reports it as having data ready for reading (presumably containing
>> "DONE"), and we only write to the client socket if there is activity on
>> the selected mailbox.
>> If the client's connection has dropped out, no data will ever appear on
>> the socket, so select will never flag it as readable, so we will never
>> try to read from it, so we will never receive the read error even though
>> tcp_keepalive detected the dropout.  And if this client was idling with
>> a low-activity mailbox selected (such as Drafts or Sent), it might be a
>> very long time before any activity prompts us to write to the socket, so
>> we also don't receive the write error.  And so even though the socket
>> itself knows there's no connection anymore thanks to tcp_keepalive, we
>> don't know that, because we haven't tried to interact with it.  And so
>> the connection/process doesn't get cleaned up.
>> And so I think this patch is meant to provide an extra protection from
>> this case.  tcp_keepalive is fine generally, but idling clients can slip
>> through the cracks in certain circumstances, so let's fill those cracks.
>>  Does that sound right?
>> In writing this, I wonder what happens if a client initiates IDLE
>> without having first selected a mailbox.  To my reading, RFC 2177
>> implies that this is sort of pointless, but doesn't make an explicit
>> statement about it one way or another.  I don't know what Cyrus actually
>> does in this case -- there's something to investigate -- but I guess if
>> there's a crack there, the imapidletimeout patch will fill that too.
>> Any thoughts?
>> Cheers,
>> ellie
>> On Wed, Sep 14, 2016, at 05:11 PM, Thomas Jarosch wrote:
>>> Hi Ellie,
>>> On Monday, 12. September 2016 11:35:45 ellie timoney wrote:
>>> [clock jumps]
>>>> Or does it?  The man page says it's "not  affected by discontinuous
>>>> jumps in the system time (e.g., if the system administrator manually
>>>> changes the clock)" -- great -- "but is affected by the incremental
>>>> adjustments performed by adjtime(3) and NTP".  Which sounds to me like
>>>> NTP might still be an issue?   (But: I have no real world experience of
>>>> this, I'm just reading man pages here.)
>>> Good point. Not sure here, we didn't encounter an
>>> issue for a long time. The event itself is rather rare these days.
>>>>> Would it make sense to enable the timeout by default?
>>>>> In the current version of the patch it's disabled (value 0).
>>>> I'm interested in hearing thoughts on this, particularly with regard to
>>>> what a reasonable default timeout might be.  Though I like the "no
>>>> behaviour change unless you change configuration" aspect of defaulting
>>>> to 0.
>>> We'll push out the three days default value next week.
>>> I can report back in a month how good or bad the results are.
>>> Cheers,
>>> Thomas
> Ellie, I agree a "crack" exists that idled processes may be slipping
> through but so far I have little data to prove it.
> Empirically I have one server with two clients (I have moved everyone
> else to other servers to decrease the number of variables), and the
> process count in the IDLED state for those two clients grows apparently
> without bound (at least I haven't found an upper limit yet).  I have
> been increasing the point at which I am alerted for "excess imapd
> processes" and it is up to 100 processes now.  After about 24 hours
> these two clients reach that point and every process in
> /var/run/cyrus/proc/ is attributed to them like this.
> imap  hermione.ironicdesign.com []  b2b at cogift.co
> cogift.co!user.b2b  Idle
> hermione is our nginx load balancer on an internal network.
> As far as we have been able to tell, no other client has this problem.
> Another data point...these are very low traffic accounts (6 emails for
> one and 0 for the other in the last week).
> I am going to contact the owner of these two accounts today and ask her
> what client she is using and how often she has it set up to check email.

OK, the client that appears to be causing our problem with apparently 
abandoned imapd idled processes is an old (possibly more than 5 years 
old) BlackBerry.  Our client has it set to check all her mailboxes every 
10-15 minutes.  I have no idea how reliable the connectivity is for this 
BB, but given how active she is I would not be surprised to hear that 
the BB regularly loses connectivity when she is driving (which she does 
a lot).

Unless anyone has another suggestion, I plan to use wireshark to capture 
port 143 packets (given the age of the BB I bet it doesn't talk the 
versions of TLS we will accept and we no longer accept SSL, so port 993 
is a no-go for it) to these two addresses. Given that their email 
traffic is so light (less than 1/day) I should be able to capture lots 
of connections with no mail.

FWIW, I have captured and analyzed lots of packets before, but never 
IMAP...so if anyone has any hints about what to look for, feel free to 
speak up.  ;-)

Andy Dorman

More information about the Cyrus-devel mailing list