One more attempt: stuck processes

Gary Mills mills at cc.umanitoba.ca
Fri Nov 16 10:37:42 EST 2007


On Fri, Nov 16, 2007 at 03:20:57PM +0100, Sebastian Hagedorn wrote:
> --On 16. November 2007 08:00:07 -0600 Gary Mills <mills at cc.umanitoba.ca> 
> wrote:
> 
> >This timeout doesn't work in some cases.  We have lots of POP sessions
> >that never terminate.
> 
> That's interesting to hear! Especially since you are using Solaris.
> 
> > About 30 out of 40 are in that state now.
> >Here's an example:
> >
> >   cyrus 13075   708  0   Oct 14 ?        0:05 pop3d -s
> >   cyrus 20023   708  0   Oct 29 ?        0:00 pop3d
> >   cyrus 24560   708  1 07:38:03 ?        0:03 pop3d
> >   cyrus   631   708  0   Oct 03 ?        0:10 pop3d -s
> >   cyrus  6786   708  0   Oct 20 ?        0:00 pop3d -s
> >   cyrus 29777   708  0 07:45:03 ?        0:00 pop3d
> >   cyrus 19175   708  0   Oct 04 ?        0:04 pop3d -s
> >
> >One I just checked is stuck in a read():
> >
> >  # truss -p 19175
> >  read(0, 0x002316F0, 5)          (sleeping...)
> >  ^?# pfiles 19175
> >  19175:  pop3d -s
> >    Current rlimit: 256 file descriptors
> >     0: S_IFSOCK mode:0666 dev:271,0 ino:25813 uid:0 gid:0 size:0
> >        O_RDWR
> >          sockname: AF_INET 130.179.16.23  port: 995
> >          peername: AF_INET 130.179.188.184  port: 51771
> 
> Could you get a stack trace? If you have gdb you just call it with "gdb -p 
> 19175". Then you can do "bt" at the prompt. I forget how to do it with 
> Sun's debugger.

Easy:

  # pstack 19175
  19175:  pop3d -s
   fef9f810 read     (0, 2316f0, 5)
   fee1d2d0 read     (0, 2316f0, 5, 0, 0, 0) + 5c
   ff06bb38 sock_read (1f0860, 2316f0, 5, 5, 0, 0) + 24
   ff068af0 BIO_read (1f0860, 2316f0, 5, fef98b84, 0, 0) + 110
   ff278488 ssl3_read_n (212798, 5, 8805, 0, 0, 203958) + 174
   ff2785fc ssl3_get_record (204ce0, 8000, 8400, 4400, f1, f0) + d0
   ff279424 ssl3_read_bytes (212798, 1000, 2000, 4, 0, ffbfe731) + 228
   ff27a99c ssl3_get_message (ff2a259c, 2070a0, 0, ffffffff, 19000, ffbfe7a0) + d0 ff27042c ssl3_accept (2150, 2160, 2180, 21e0, 2110, 2122) + 904
   ff27bd2c ssl23_get_client_hello (2316fb, 6c, 6c, 4, fffffe79, 0) + 828
   ff27b4b4 ssl23_accept (4000, 2000, 0, 0, 0, 0) + 2a4
   00032d00 tls_start_servertls (0, 1, ffbfee24, ffbfee20, 1849a8, ff00) + 198
   0002c504 cmd_starttls (1, 1fd8b8, 0, 0, 0, 0) + 184
   0002a638 service_main (2, 192198, ffbffce0, 1aec4, 3508c, 1) + 488
   00035250 main     (2, ffbffcd4, ffbffce0, 17c400, 0, 0) + e18
   00029298 _start   (0, 0, 0, 0, 0, 0) + 108

I've confirmed that the client has gone away a long time ago.

-- 
-Gary Mills-    -Unix Support-    -U of M Academic Computing and Networking-


More information about the Info-cyrus mailing list