One more attempt: stuck processes

Sebastian Hagedorn Hagedorn at uni-koeln.de
Fri Nov 16 06:36:49 EST 2007


--On 16. November 2007 11:27:09 +0100 Sebastian Hagedorn 
<hagedorn at uni-koeln.de> wrote:

>> 1) Since it only happens on dialup connections, could it be that the
>> dialin router at the providers end sends TCP/RST when a client hangs up
>> and those packets are filtered somewhere, maybe on your firewall?
>
> OK, let's run with that one.
>
> a) We don't really have a firewall, we only use ACLs on the Cisco
> routers. You can't even filter TCP/RST there.
>
> b) Even *if* a TCP/RST had been dropped, lost or whatever, the server
> *still* should timeout eventually!

I just had a discussion with a colleague regarding this. He made two 
observations:

1. In the absence of the SO_KEEPALIVE option it is entirely possible that a 
TCP connection remains ESTABLISHED even when the other side has gone.

This may not be a solution to this particular problem, but it made me 
wonder why Cyrus does *not* use SO_KEEPALIVE. Is there a downside to it?

2. The stack trace looks garbled:

(gdb) bt
#0  0x0079f41e in __read_nocancel () from /lib/tls/libc.so.6
#1  0x00d0b2f7 in BIO_new_socket () from /lib/libcrypto.so.4
#2  0x00d092b2 in BIO_read () from /lib/libcrypto.so.4
#3  0x005dae13 in ssl23_read_bytes () from /lib/libssl.so.4
#4  0x005d9c51 in ssl23_get_client_hello () from /lib/libssl.so.4
#5  0x005d9712 in ssl23_accept () from /lib/libssl.so.4
#6  0x005ddc9a in SSL_accept () from /lib/libssl.so.4
#7  0x08052cb3 in shut_down ()
#8  0x0804e513 in shut_down ()
#9  0x0804d58c in ?? ()
#10 0x00000001 in ?? ()
#11 0x082ee848 in ?? ()
#12 0x00000000 in ?? ()

He suggested that the trace is unreliable. Perhaps a bug in RHEL 3's 
version of OpenSSL messes up the stack. That would also explain why nobody 
else seems to have this problem.

I think I will try one more approach: I reverted cyrus.conf to not use "-U 
1" anymore, so that processes should be reused. I will strace one of the 
pop3d processes in the hope that it gets stuck. That way I should be able 
to see where things go wrong. If the process terminates normally I will try 
with another one. If that doesn't go anywhere, I guess I'll drop this 
investigation. We will upgrade to RHEL 5 some time next year, so hopefully 
that will bring new bugs :-)
-- 
     .:.Sebastian Hagedorn - RZKR-R1 (Gebäude 52), Zimmer 18.:.
Zentrum für angewandte Informatik - Universitätsweiter Service RRZK
.:.Universität zu Köln / Cologne University - ✆ +49-221-478-5587.:.
                   .:.:.:.Skype: shagedorn.:.:.:.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 186 bytes
Desc: not available
Url : http://lists.andrew.cmu.edu/pipermail/info-cyrus/attachments/20071116/5ee6eb37/attachment.bin 


More information about the Info-cyrus mailing list