Still an issue: stuck processes
Michael Loftis
mloftis at wgops.com
Tue Jul 5 22:18:16 EDT 2005
I've been getting pop3d and the pop3 proxy (murder proxy) lockup issue
occasionally on my debian systems as well...don't have specifics right now,
I'm at home, but if I remember I'll try to get them...it doesn't happen
reproducibly. Just an occasional random lockup. no it is not /dev/random
issue ;)
--On July 5, 2005 10:48:21 AM -0400 Ken Murchison <ken at oceana.com> wrote:
> Sebastian Hagedorn wrote:
>
>> Hi,
>>
>> we are running the following setup under Red Hat Linux Advanced Server 3:
>>
>> name : Cyrus IMAPD
>> version : v2.2.12-Invoca-RPM-2.2.12-1.ZAIK 2005/02/14 16:43:51
>> vendor : Project Cyrus
>> support-url: http://asg.web.cmu.edu/cyrus
>> os : Linux
>> os-version : 2.4.21-27.0.2.ELsmp
>> environment: Built w/Cyrus SASL 2.1.20
>> Running w/Cyrus SASL 2.1.20
>> Built w/Sleepycat Software: Berkeley DB 4.1.25: (August 21,
>> 2003)
>> Running w/Sleepycat Software: Berkeley DB 4.1.25: (August
>> 21, 2003)
>> Built w/OpenSSL 0.9.7a Feb 19 2003
>> Running w/OpenSSL 0.9.7a Feb 19 2003
>> CMU Sieve 2.2
>> TCP Wrappers
>> mmap = shared
>> lock = fcntl
>> nonblock = fcntl
>> auth = unix
>> idle = idled
>>
>> In earlier versions of Cyrus we experienced problems where processes got
>> stuck and caused subsequent connections to mailboxes to fail due to lock
>> contention. Some work was done to solve this, but I wonder if the
>> success is only cosmetic. It seems to me as if processes still get
>> stuck, it just doesn't keep new connections from working.
>>
>> I noticed that our server has an ever increasing number of processes.
>> I'm attaching a screenshot of the relevant Ganglia graph for the last
>> month. I see that there are many imapd and pop3d processes that have
>> been running for a long time, i.e. since the middle of May:
>>
>> [root at lvr13 root]# ps -aef|grep pop3
>> cyrus 1588 22788 0 May13 ? 00:00:03 pop3d -s
>> cyrus 2810 22788 0 May13 ? 00:00:01 pop3d -s
>> cyrus 32464 22788 0 May13 ? 00:00:02 pop3d -s
>> cyrus 7941 22788 0 May13 ? 00:00:00 pop3d -s
>> cyrus 5331 22788 0 May14 ? 00:00:02 pop3d -s
>> cyrus 4319 22788 0 May14 ? 00:00:02 pop3d -s
>> cyrus 9054 22788 0 May14 ? 00:00:00 pop3d -s
>> cyrus 25309 22788 0 May14 ? 00:00:00 pop3d -s
>> cyrus 8176 22788 0 May14 ? 00:00:02 pop3d -s
>> cyrus 21482 22788 0 May14 ? 00:00:00 pop3d
>> ...
>>
>> All of them seem to be stuck somewhere in SSL, but ultimately in
>> __read_nocancel (). I'll give two examples.
>>
>> PID 1588:
>> (gdb) where
>> # 0 0x006d1f0e in __read_nocancel () from /lib/tls/libc.so.6
>> # 1 0x00c16427 in BIO_new_socket () from /lib/libcrypto.so.4
>> # 2 0x00c143e2 in BIO_read () from /lib/libcrypto.so.4
>> # 3 0x007b4c30 in ssl3_alert_code () from /lib/libssl.so.4
>> # 4 0x007b4dcc in ssl3_alert_code () from /lib/libssl.so.4
>> # 5 0x007b60cf in ssl3_read_bytes () from /lib/libssl.so.4
>> # 6 0x007b6ffc in ssl3_get_message () from /lib/libssl.so.4
>> # 7 0x007accab in ssl3_accept () from /lib/libssl.so.4
>> # 8 0x007ac944 in ssl3_accept () from /lib/libssl.so.4
>> # 9 0x007bbcaa in SSL_accept () from /lib/libssl.so.4
>> # 10 0x007b780d in ssl23_get_client_hello () from /lib/libssl.so.4
>> # 11 0x007b7712 in ssl23_accept () from /lib/libssl.so.4
>> # 12 0x007bbcaa in SSL_accept () from /lib/libssl.so.4
>> # 13 0x08051bc3 in shut_down ()
>> # 14 0x0804dda3 in shut_down ()
>> # 15 0x0804ce9d in ?? ()
>> # 16 0x00000001 in ?? ()
>> # 17 0x098eab90 in ?? ()
>> # 18 0x00000000 in ?? ()
>> (gdb)
>>
>>
>> 21482:
>> (gdb) where
>> # 0 0x006f4f0e in __read_nocancel () from /lib/tls/libc.so.6
>> # 1 0x00355427 in BIO_new_socket () from /lib/libcrypto.so.4
>> # 2 0x003533e2 in BIO_read () from /lib/libcrypto.so.4
>> # 3 0x0047ae23 in ssl23_read_bytes () from /lib/libssl.so.4
>> # 4 0x00479c61 in ssl23_get_client_hello () from /lib/libssl.so.4
>> # 5 0x00479712 in ssl23_accept () from /lib/libssl.so.4
>> # 6 0x0047dcaa in SSL_accept () from /lib/libssl.so.4
>> # 7 0x08051bc3 in shut_down ()
>> # 8 0x0804dda3 in shut_down ()
>> # 9 0x0804dba8 in shut_down ()
>> # 10 0x0804cde9 in ?? ()
>> # 11 0x095f74d0 in ?? ()
>> # 12 0x0807e79c in config_need_data ()
>> # 13 0x095a5978 in ?? ()
>> # 14 0x0807fff6 in config_need_data ()
>> # 15 0x0807e778 in config_need_data ()
>> # 16 0x08101c40 in ?? ()
>> # 17 0x00000000 in ?? ()
>> (gdb)
>>
>> Fortunately these stuck processes don't hold any locks anymore! I
>> understand that I can probably just kill them, but I wonder what the
>> underlying cause of this problem is. Is it likely something in Cyrus or
>> something in the libraries?
>
> Is this only a problem with pop3d or with imapd as well? I can't
> reproduce your problem here. Is there some kind of proxy or webmail
> process which might be unfriendly?
>
>
> --
> Kenneth Murchison Oceana Matrix Ltd.
> Software Engineer 21 Princeton Place
> 716-662-8973 x26 Orchard Park, NY 14127
> --PGP Public Key-- http://www.oceana.com/~ken/ksm.pgp
> ---
> Cyrus Home Page: http://asg.web.cmu.edu/cyrus
> Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
> List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
>
--
"Genius might be described as a supreme capacity for getting its possessors
into trouble of all kinds."
-- Samuel Butler
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
More information about the Info-cyrus
mailing list