Still an issue: stuck processes
Ken Murchison
ken at oceana.com
Tue Jul 5 10:48:21 EDT 2005
Sebastian Hagedorn wrote:
> Hi,
>
> we are running the following setup under Red Hat Linux Advanced Server 3:
>
> name : Cyrus IMAPD
> version : v2.2.12-Invoca-RPM-2.2.12-1.ZAIK 2005/02/14 16:43:51
> vendor : Project Cyrus
> support-url: http://asg.web.cmu.edu/cyrus
> os : Linux
> os-version : 2.4.21-27.0.2.ELsmp
> environment: Built w/Cyrus SASL 2.1.20
> Running w/Cyrus SASL 2.1.20
> Built w/Sleepycat Software: Berkeley DB 4.1.25: (August 21,
> 2003)
> Running w/Sleepycat Software: Berkeley DB 4.1.25: (August
> 21, 2003)
> Built w/OpenSSL 0.9.7a Feb 19 2003
> Running w/OpenSSL 0.9.7a Feb 19 2003
> CMU Sieve 2.2
> TCP Wrappers
> mmap = shared
> lock = fcntl
> nonblock = fcntl
> auth = unix
> idle = idled
>
> In earlier versions of Cyrus we experienced problems where processes got
> stuck and caused subsequent connections to mailboxes to fail due to lock
> contention. Some work was done to solve this, but I wonder if the
> success is only cosmetic. It seems to me as if processes still get
> stuck, it just doesn't keep new connections from working.
>
> I noticed that our server has an ever increasing number of processes.
> I'm attaching a screenshot of the relevant Ganglia graph for the last
> month. I see that there are many imapd and pop3d processes that have
> been running for a long time, i.e. since the middle of May:
>
> [root at lvr13 root]# ps -aef|grep pop3
> cyrus 1588 22788 0 May13 ? 00:00:03 pop3d -s
> cyrus 2810 22788 0 May13 ? 00:00:01 pop3d -s
> cyrus 32464 22788 0 May13 ? 00:00:02 pop3d -s
> cyrus 7941 22788 0 May13 ? 00:00:00 pop3d -s
> cyrus 5331 22788 0 May14 ? 00:00:02 pop3d -s
> cyrus 4319 22788 0 May14 ? 00:00:02 pop3d -s
> cyrus 9054 22788 0 May14 ? 00:00:00 pop3d -s
> cyrus 25309 22788 0 May14 ? 00:00:00 pop3d -s
> cyrus 8176 22788 0 May14 ? 00:00:02 pop3d -s
> cyrus 21482 22788 0 May14 ? 00:00:00 pop3d
> ...
>
> All of them seem to be stuck somewhere in SSL, but ultimately in
> __read_nocancel (). I'll give two examples.
>
> PID 1588:
> (gdb) where
> #0 0x006d1f0e in __read_nocancel () from /lib/tls/libc.so.6
> #1 0x00c16427 in BIO_new_socket () from /lib/libcrypto.so.4
> #2 0x00c143e2 in BIO_read () from /lib/libcrypto.so.4
> #3 0x007b4c30 in ssl3_alert_code () from /lib/libssl.so.4
> #4 0x007b4dcc in ssl3_alert_code () from /lib/libssl.so.4
> #5 0x007b60cf in ssl3_read_bytes () from /lib/libssl.so.4
> #6 0x007b6ffc in ssl3_get_message () from /lib/libssl.so.4
> #7 0x007accab in ssl3_accept () from /lib/libssl.so.4
> #8 0x007ac944 in ssl3_accept () from /lib/libssl.so.4
> #9 0x007bbcaa in SSL_accept () from /lib/libssl.so.4
> #10 0x007b780d in ssl23_get_client_hello () from /lib/libssl.so.4
> #11 0x007b7712 in ssl23_accept () from /lib/libssl.so.4
> #12 0x007bbcaa in SSL_accept () from /lib/libssl.so.4
> #13 0x08051bc3 in shut_down ()
> #14 0x0804dda3 in shut_down ()
> #15 0x0804ce9d in ?? ()
> #16 0x00000001 in ?? ()
> #17 0x098eab90 in ?? ()
> #18 0x00000000 in ?? ()
> (gdb)
>
>
> 21482:
> (gdb) where
> #0 0x006f4f0e in __read_nocancel () from /lib/tls/libc.so.6
> #1 0x00355427 in BIO_new_socket () from /lib/libcrypto.so.4
> #2 0x003533e2 in BIO_read () from /lib/libcrypto.so.4
> #3 0x0047ae23 in ssl23_read_bytes () from /lib/libssl.so.4
> #4 0x00479c61 in ssl23_get_client_hello () from /lib/libssl.so.4
> #5 0x00479712 in ssl23_accept () from /lib/libssl.so.4
> #6 0x0047dcaa in SSL_accept () from /lib/libssl.so.4
> #7 0x08051bc3 in shut_down ()
> #8 0x0804dda3 in shut_down ()
> #9 0x0804dba8 in shut_down ()
> #10 0x0804cde9 in ?? ()
> #11 0x095f74d0 in ?? ()
> #12 0x0807e79c in config_need_data ()
> #13 0x095a5978 in ?? ()
> #14 0x0807fff6 in config_need_data ()
> #15 0x0807e778 in config_need_data ()
> #16 0x08101c40 in ?? ()
> #17 0x00000000 in ?? ()
> (gdb)
>
> Fortunately these stuck processes don't hold any locks anymore! I
> understand that I can probably just kill them, but I wonder what the
> underlying cause of this problem is. Is it likely something in Cyrus or
> something in the libraries?
Is this only a problem with pop3d or with imapd as well? I can't
reproduce your problem here. Is there some kind of proxy or webmail
process which might be unfriendly?
--
Kenneth Murchison Oceana Matrix Ltd.
Software Engineer 21 Princeton Place
716-662-8973 x26 Orchard Park, NY 14127
--PGP Public Key-- http://www.oceana.com/~ken/ksm.pgp
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
More information about the Info-cyrus
mailing list