[RFC PATCH v2] imapd.c: imapoptions: implement idle timeout

ellie timoney ellie at fastmail.com
Mon Sep 19 22:02:01 EDT 2016


I've been looking at tcp_keepalive a bit lately and I'm wondering how it
interacts with this?

It's my understanding that, in most cases, tcp_keepalive will do the job
of detecting clients that have dropped out, and allow us to close the
connection on our end.  Since we're generally either waiting for a
command from the client, or producing and about to send output to the
client, this works -- because if tcp_keepalive detects that the client
isn't there, reads and writes to the socket will start failing.

But during the IDLE state, we only read from the client socket if select
reports it as having data ready for reading (presumably containing
"DONE"), and we only write to the client socket if there is activity on
the selected mailbox.

If the client's connection has dropped out, no data will ever appear on
the socket, so select will never flag it as readable, so we will never
try to read from it, so we will never receive the read error even though
tcp_keepalive detected the dropout.  And if this client was idling with
a low-activity mailbox selected (such as Drafts or Sent), it might be a
very long time before any activity prompts us to write to the socket, so
we also don't receive the write error.  And so even though the socket
itself knows there's no connection anymore thanks to tcp_keepalive, we
don't know that, because we haven't tried to interact with it.  And so
the connection/process doesn't get cleaned up.

And so I think this patch is meant to provide an extra protection from
this case.  tcp_keepalive is fine generally, but idling clients can slip
through the cracks in certain circumstances, so let's fill those cracks.
 Does that sound right?

In writing this, I wonder what happens if a client initiates IDLE
without having first selected a mailbox.  To my reading, RFC 2177
implies that this is sort of pointless, but doesn't make an explicit
statement about it one way or another.  I don't know what Cyrus actually
does in this case -- there's something to investigate -- but I guess if
there's a crack there, the imapidletimeout patch will fill that too.

Any thoughts?

Cheers,

ellie

On Wed, Sep 14, 2016, at 05:11 PM, Thomas Jarosch wrote:
> Hi Ellie,
> 
> On Monday, 12. September 2016 11:35:45 ellie timoney wrote:
> [clock jumps]
> > Or does it?  The man page says it's "not  affected by discontinuous
> > jumps in the system time (e.g., if the system administrator manually
> > changes the clock)" -- great -- "but is affected by the incremental
> > adjustments performed by adjtime(3) and NTP".  Which sounds to me like
> > NTP might still be an issue?   (But: I have no real world experience of
> > this, I'm just reading man pages here.)
> 
> Good point. Not sure here, we didn't encounter an
> issue for a long time. The event itself is rather rare these days.
> 
> > > Would it make sense to enable the timeout by default?
> > > In the current version of the patch it's disabled (value 0).
> > 
> > I'm interested in hearing thoughts on this, particularly with regard to
> > what a reasonable default timeout might be.  Though I like the "no
> > behaviour change unless you change configuration" aspect of defaulting
> > to 0.
> 
> We'll push out the three days default value next week.
> I can report back in a month how good or bad the results are.
> 
> Cheers,
> Thomas
> 


More information about the Cyrus-devel mailing list