One more attempt: stuck processes

Sebastian Hagedorn Hagedorn at uni-koeln.de
Fri Nov 16 08:48:39 EST 2007


--On 16. November 2007 13:54:24 +0100 Alain Spineux <aspineux at gmail.com> 
wrote:

> On Nov 16, 2007 12:36 PM, Sebastian Hagedorn <Hagedorn at uni-koeln.de>
> wrote:
>> I just had a discussion with a colleague regarding this. He made two
>> observations:
>>
>> 1. In the absence of the SO_KEEPALIVE option it is entirely possible
>> that a TCP connection remains ESTABLISHED even when the other side has
>> gone.
>
> I said that socket should timeout, but this is true only when the
> protocol (TCP here)
> require a response (usualy AK here) or at connection establishement.

Right.

> On the contrary
> it should stay open indefinitely util something happens. Router doing
> NAT can drop
> a too old connection, because it has to maintains a NAT table and make
> some cleanup time to time, this where "KEEPALIVE" become usefull.

Not only there, but I think also in the case of unilaterally dropped 
connections.

>> This may not be a solution to this particular problem, but it made me
>> wonder why Cyrus does *not* use SO_KEEPALIVE. Is there a downside to it?
>
> Cyrus has already a built-in time out, it seems a lite conflicting to
> actively maintains the connection until it drop it itself !

I'm not sure I understand that sentence.

> This is the works of the client to actively maintains the connection,
> if it want it !

Yes, but what if the client is gone? I realise that *normally* the server 
keeps a built-in timeout, but I'm guessing that sometimes it doesn't work, 
perhaps because something (in prot_fill() perhaps?) blocks.

>> I think I will try one more approach: I reverted cyrus.conf to not use
>> "-U 1" anymore, so that processes should be reused. I will strace one of
>> the pop3d processes in the hope that it gets stuck. That way I should be
>> able to see where things go wrong. If the process terminates normally I
>> will try with another one. If that doesn't go anywhere, I guess I'll
>> drop this
>
> You could try to replace imapd by a home made script, something like .
>
> mv imapd imapd_
> echo exec strace -o /tmp/imapd.$$ imapd_ $* > imapd
> chmod imapd a+x

Thanks for the suggestion. I'll think about it, although I'm wary of doing 
that on a production server.

>> investigation. We will upgrade to RHEL 5 some time next year, so
>> hopefully that will bring new bugs :-)
>
> Sorry but I dont understand what you are complaining about!

I'm not complaining ...

> Is-it because the imap or pop client is loosing its connection and
> this disturb the user

No.

> or just because you are getting some sleeping processes ?

If it were "some" I wouldn't worry. I'm talking hundreds of processes! I 
know I can kill them, in fact for the pop3d processes we run this command 
once a month:

ps -C pop3d -o pid,start|grep [a-z]|awk '{print $1}'|xargs kill

(It kills pop3d processes that have the month in their start time, i.e. are 
more than a day old)

But for imapd processes it's not as easy to tell if they are just 
long-living or stuck.

> Do you have a "timeout" option in your imapd.conf to force the
> imap/pop server to autologout ?

No. But both POP and IMAP have default timeouts. They just don't work in my 
case.
-- 
     .:.Sebastian Hagedorn - RZKR-R1 (Gebäude 52), Zimmer 18.:.
Zentrum für angewandte Informatik - Universitätsweiter Service RRZK
.:.Universität zu Köln / Cologne University - ✆ +49-221-478-5587.:.
                   .:.:.:.Skype: shagedorn.:.:.:.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 186 bytes
Desc: not available
Url : http://lists.andrew.cmu.edu/pipermail/info-cyrus/attachments/20071116/5509eb80/attachment.bin 


More information about the Info-cyrus mailing list