Experiment to test TCP keepalive for pop3d proxies

Gary Mills mills at cc.umanitoba.ca
Thu May 27 17:52:46 EDT 2010


Ever since I can remember, our Cyrus installation had a problem with
pop3d processes accumulating on the murder front end server.  This
didn't happen with imapd processes or with pop3d on the back end.  A
couple of weeks ago, I counted 423 pop3d processes on the front end
but only 37 on the back end.  Some of them were months old.  All had
an established TCP connection from a client.  Here's a typical stack
trace:

    # pstack 12708
    12708:  pop3d -s
     feb1a5c5 read     (0, 817faf0, b)
     fec2dfaf sock_read () + 3f

POP3 timeouts were enabled on both front and back ends, but it seemed
not to work on the front end.  We're still running cyrus-imapd-2.3.8.
It's possible that this problem is fixed in the current version,
cyrus-imapd-2.3.16.

In any case, I wanted to try enabling TCP keepalive to see if it had
any effect on the problem.  This only required a few lines of code:

    --- pop3d.c-nokeep      Wed Apr 11 10:49:59 2007
    +++ pop3d.c     Mon May 17 18:17:22 2010
    @@ -494,6 +494,12 @@
            if (getsockname(0, (struct sockaddr *)&popd_localaddr, &salen) == 0) {
                popd_haveaddr = 1;
            }
    +       /* Set keepalive option */
    +       {
    +         int oval = 1;
    +         (void)setsockopt(0, SOL_SOCKET, SO_KEEPALIVE, (const void *)&oval,
    +                    sizeof(oval));
    +       }
         }
    
         /* other params should be filled in */

A complete installation would include a configuration setting to
enable or disable TCP keepalive, along with ways to set keepalive
values that exist in many operating systems.  This was just a test,
but it was quite impressive.  `pop3d' processes no longer accumulated
on the front end, but were similar in number to the ones on the back
end.  The cause must have been clients that disappeared without
closing their TCP connections.  The TCP keepalive mechanism now does
this for them, after about half an hour of idleness.

Does anyone know if this problem has been solved by a timeout in
later Cyrus versions?  That's actually a better solution.  It does
only seem to happen when pop3d runs on a murder front end, relaying
connections to a back end.  If it hasn't been solved, I'll proceed
with the keepalive solution.  Otherwise, I'll plan for an upgrade.

-- 
-Gary Mills-        -Unix Group-        -Computer and Network Services-


More information about the Info-cyrus mailing list