Still an issue: stuck processes

Ken Murchison ken at oceana.com
Tue Jul 5 10:48:21 EDT 2005


Sebastian Hagedorn wrote:

> Hi,
> 
> we are running the following setup under Red Hat Linux Advanced Server 3:
> 
> name       : Cyrus IMAPD
> version    : v2.2.12-Invoca-RPM-2.2.12-1.ZAIK 2005/02/14 16:43:51
> vendor     : Project Cyrus
> support-url: http://asg.web.cmu.edu/cyrus
> os         : Linux
> os-version : 2.4.21-27.0.2.ELsmp
> environment: Built w/Cyrus SASL 2.1.20
>             Running w/Cyrus SASL 2.1.20
>             Built w/Sleepycat Software: Berkeley DB 4.1.25: (August 21, 
> 2003)
>             Running w/Sleepycat Software: Berkeley DB 4.1.25: (August 
> 21, 2003)
>             Built w/OpenSSL 0.9.7a Feb 19 2003
>             Running w/OpenSSL 0.9.7a Feb 19 2003
>             CMU Sieve 2.2
>             TCP Wrappers
>             mmap = shared
>             lock = fcntl
>             nonblock = fcntl
>             auth = unix
>             idle = idled
> 
> In earlier versions of Cyrus we experienced problems where processes got 
> stuck and caused subsequent connections to mailboxes to fail due to lock 
> contention. Some work was done to solve this, but I wonder if the 
> success is only cosmetic. It seems to me as if processes still get 
> stuck, it just doesn't keep new connections from working.
> 
> I noticed that our server has an ever increasing number of processes. 
> I'm attaching a screenshot of the relevant Ganglia graph for the last 
> month. I see that there are many imapd and pop3d processes that have 
> been running for a long time, i.e. since the middle of May:
> 
> [root at lvr13 root]# ps -aef|grep pop3
> cyrus     1588 22788  0 May13 ?        00:00:03 pop3d -s
> cyrus     2810 22788  0 May13 ?        00:00:01 pop3d -s
> cyrus    32464 22788  0 May13 ?        00:00:02 pop3d -s
> cyrus     7941 22788  0 May13 ?        00:00:00 pop3d -s
> cyrus     5331 22788  0 May14 ?        00:00:02 pop3d -s
> cyrus     4319 22788  0 May14 ?        00:00:02 pop3d -s
> cyrus     9054 22788  0 May14 ?        00:00:00 pop3d -s
> cyrus    25309 22788  0 May14 ?        00:00:00 pop3d -s
> cyrus     8176 22788  0 May14 ?        00:00:02 pop3d -s
> cyrus    21482 22788  0 May14 ?        00:00:00 pop3d
> ...
> 
> All of them seem to be stuck somewhere in SSL, but ultimately in 
> __read_nocancel (). I'll give two examples.
> 
> PID 1588:
> (gdb) where
> #0  0x006d1f0e in __read_nocancel () from /lib/tls/libc.so.6
> #1  0x00c16427 in BIO_new_socket () from /lib/libcrypto.so.4
> #2  0x00c143e2 in BIO_read () from /lib/libcrypto.so.4
> #3  0x007b4c30 in ssl3_alert_code () from /lib/libssl.so.4
> #4  0x007b4dcc in ssl3_alert_code () from /lib/libssl.so.4
> #5  0x007b60cf in ssl3_read_bytes () from /lib/libssl.so.4
> #6  0x007b6ffc in ssl3_get_message () from /lib/libssl.so.4
> #7  0x007accab in ssl3_accept () from /lib/libssl.so.4
> #8  0x007ac944 in ssl3_accept () from /lib/libssl.so.4
> #9  0x007bbcaa in SSL_accept () from /lib/libssl.so.4
> #10 0x007b780d in ssl23_get_client_hello () from /lib/libssl.so.4
> #11 0x007b7712 in ssl23_accept () from /lib/libssl.so.4
> #12 0x007bbcaa in SSL_accept () from /lib/libssl.so.4
> #13 0x08051bc3 in shut_down ()
> #14 0x0804dda3 in shut_down ()
> #15 0x0804ce9d in ?? ()
> #16 0x00000001 in ?? ()
> #17 0x098eab90 in ?? ()
> #18 0x00000000 in ?? ()
> (gdb)
> 
> 
> 21482:
> (gdb) where
> #0  0x006f4f0e in __read_nocancel () from /lib/tls/libc.so.6
> #1  0x00355427 in BIO_new_socket () from /lib/libcrypto.so.4
> #2  0x003533e2 in BIO_read () from /lib/libcrypto.so.4
> #3  0x0047ae23 in ssl23_read_bytes () from /lib/libssl.so.4
> #4  0x00479c61 in ssl23_get_client_hello () from /lib/libssl.so.4
> #5  0x00479712 in ssl23_accept () from /lib/libssl.so.4
> #6  0x0047dcaa in SSL_accept () from /lib/libssl.so.4
> #7  0x08051bc3 in shut_down ()
> #8  0x0804dda3 in shut_down ()
> #9  0x0804dba8 in shut_down ()
> #10 0x0804cde9 in ?? ()
> #11 0x095f74d0 in ?? ()
> #12 0x0807e79c in config_need_data ()
> #13 0x095a5978 in ?? ()
> #14 0x0807fff6 in config_need_data ()
> #15 0x0807e778 in config_need_data ()
> #16 0x08101c40 in ?? ()
> #17 0x00000000 in ?? ()
> (gdb)
> 
> Fortunately these stuck processes don't hold any locks anymore! I 
> understand that I can probably just kill them, but I wonder what the 
> underlying cause of this problem is. Is it likely something in Cyrus or 
> something in the libraries?

Is this only a problem with pop3d or with imapd as well?  I can't 
reproduce your problem here.  Is there some kind of proxy or webmail 
process which might be unfriendly?


-- 
Kenneth Murchison     Oceana Matrix Ltd.
Software Engineer     21 Princeton Place
716-662-8973 x26      Orchard Park, NY 14127
--PGP Public Key--    http://www.oceana.com/~ken/ksm.pgp
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html




More information about the Info-cyrus mailing list