Still an issue: stuck processes

Michael Loftis mloftis at wgops.com
Tue Jul 5 22:18:16 EDT 2005


I've been getting pop3d and the pop3 proxy (murder proxy) lockup issue 
occasionally on my debian systems as well...don't have specifics right now, 
I'm at home, but if I remember I'll try to get them...it doesn't happen 
reproducibly.  Just an occasional random lockup.  no it is not /dev/random 
issue ;)

--On July 5, 2005 10:48:21 AM -0400 Ken Murchison <ken at oceana.com> wrote:

> Sebastian Hagedorn wrote:
>
>> Hi,
>>
>> we are running the following setup under Red Hat Linux Advanced Server 3:
>>
>> name       : Cyrus IMAPD
>> version    : v2.2.12-Invoca-RPM-2.2.12-1.ZAIK 2005/02/14 16:43:51
>> vendor     : Project Cyrus
>> support-url: http://asg.web.cmu.edu/cyrus
>> os         : Linux
>> os-version : 2.4.21-27.0.2.ELsmp
>> environment: Built w/Cyrus SASL 2.1.20
>>             Running w/Cyrus SASL 2.1.20
>>             Built w/Sleepycat Software: Berkeley DB 4.1.25: (August 21,
>> 2003)
>>             Running w/Sleepycat Software: Berkeley DB 4.1.25: (August
>> 21, 2003)
>>             Built w/OpenSSL 0.9.7a Feb 19 2003
>>             Running w/OpenSSL 0.9.7a Feb 19 2003
>>             CMU Sieve 2.2
>>             TCP Wrappers
>>             mmap = shared
>>             lock = fcntl
>>             nonblock = fcntl
>>             auth = unix
>>             idle = idled
>>
>> In earlier versions of Cyrus we experienced problems where processes got
>> stuck and caused subsequent connections to mailboxes to fail due to lock
>> contention. Some work was done to solve this, but I wonder if the
>> success is only cosmetic. It seems to me as if processes still get
>> stuck, it just doesn't keep new connections from working.
>>
>> I noticed that our server has an ever increasing number of processes.
>> I'm attaching a screenshot of the relevant Ganglia graph for the last
>> month. I see that there are many imapd and pop3d processes that have
>> been running for a long time, i.e. since the middle of May:
>>
>> [root at lvr13 root]# ps -aef|grep pop3
>> cyrus     1588 22788  0 May13 ?        00:00:03 pop3d -s
>> cyrus     2810 22788  0 May13 ?        00:00:01 pop3d -s
>> cyrus    32464 22788  0 May13 ?        00:00:02 pop3d -s
>> cyrus     7941 22788  0 May13 ?        00:00:00 pop3d -s
>> cyrus     5331 22788  0 May14 ?        00:00:02 pop3d -s
>> cyrus     4319 22788  0 May14 ?        00:00:02 pop3d -s
>> cyrus     9054 22788  0 May14 ?        00:00:00 pop3d -s
>> cyrus    25309 22788  0 May14 ?        00:00:00 pop3d -s
>> cyrus     8176 22788  0 May14 ?        00:00:02 pop3d -s
>> cyrus    21482 22788  0 May14 ?        00:00:00 pop3d
>> ...
>>
>> All of them seem to be stuck somewhere in SSL, but ultimately in
>> __read_nocancel (). I'll give two examples.
>>
>> PID 1588:
>> (gdb) where
>> # 0  0x006d1f0e in __read_nocancel () from /lib/tls/libc.so.6
>> # 1  0x00c16427 in BIO_new_socket () from /lib/libcrypto.so.4
>> # 2  0x00c143e2 in BIO_read () from /lib/libcrypto.so.4
>> # 3  0x007b4c30 in ssl3_alert_code () from /lib/libssl.so.4
>> # 4  0x007b4dcc in ssl3_alert_code () from /lib/libssl.so.4
>> # 5  0x007b60cf in ssl3_read_bytes () from /lib/libssl.so.4
>> # 6  0x007b6ffc in ssl3_get_message () from /lib/libssl.so.4
>> # 7  0x007accab in ssl3_accept () from /lib/libssl.so.4
>> # 8  0x007ac944 in ssl3_accept () from /lib/libssl.so.4
>> # 9  0x007bbcaa in SSL_accept () from /lib/libssl.so.4
>> # 10 0x007b780d in ssl23_get_client_hello () from /lib/libssl.so.4
>> # 11 0x007b7712 in ssl23_accept () from /lib/libssl.so.4
>> # 12 0x007bbcaa in SSL_accept () from /lib/libssl.so.4
>> # 13 0x08051bc3 in shut_down ()
>> # 14 0x0804dda3 in shut_down ()
>> # 15 0x0804ce9d in ?? ()
>> # 16 0x00000001 in ?? ()
>> # 17 0x098eab90 in ?? ()
>> # 18 0x00000000 in ?? ()
>> (gdb)
>>
>>
>> 21482:
>> (gdb) where
>> # 0  0x006f4f0e in __read_nocancel () from /lib/tls/libc.so.6
>> # 1  0x00355427 in BIO_new_socket () from /lib/libcrypto.so.4
>> # 2  0x003533e2 in BIO_read () from /lib/libcrypto.so.4
>> # 3  0x0047ae23 in ssl23_read_bytes () from /lib/libssl.so.4
>> # 4  0x00479c61 in ssl23_get_client_hello () from /lib/libssl.so.4
>> # 5  0x00479712 in ssl23_accept () from /lib/libssl.so.4
>> # 6  0x0047dcaa in SSL_accept () from /lib/libssl.so.4
>> # 7  0x08051bc3 in shut_down ()
>> # 8  0x0804dda3 in shut_down ()
>> # 9  0x0804dba8 in shut_down ()
>> # 10 0x0804cde9 in ?? ()
>> # 11 0x095f74d0 in ?? ()
>> # 12 0x0807e79c in config_need_data ()
>> # 13 0x095a5978 in ?? ()
>> # 14 0x0807fff6 in config_need_data ()
>> # 15 0x0807e778 in config_need_data ()
>> # 16 0x08101c40 in ?? ()
>> # 17 0x00000000 in ?? ()
>> (gdb)
>>
>> Fortunately these stuck processes don't hold any locks anymore! I
>> understand that I can probably just kill them, but I wonder what the
>> underlying cause of this problem is. Is it likely something in Cyrus or
>> something in the libraries?
>
> Is this only a problem with pop3d or with imapd as well?  I can't
> reproduce your problem here.  Is there some kind of proxy or webmail
> process which might be unfriendly?
>
>
> --
> Kenneth Murchison     Oceana Matrix Ltd.
> Software Engineer     21 Princeton Place
> 716-662-8973 x26      Orchard Park, NY 14127
> --PGP Public Key--    http://www.oceana.com/~ken/ksm.pgp
> ---
> Cyrus Home Page: http://asg.web.cmu.edu/cyrus
> Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
> List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
>



--
"Genius might be described as a supreme capacity for getting its possessors
into trouble of all kinds."
-- Samuel Butler
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html




More information about the Info-cyrus mailing list