v2.3.15 IMAP connection hangs
    Bron Gondwana 
    brong at fastmail.fm
       
    Wed Sep 16 11:12:13 EDT 2009
    
    
  
On Tue, Sep 15, 2009 at 02:35:48PM -0400, Rosenbaum, Larry M. wrote:
> > From: Bron Gondwana [mailto:brong at fastmail.fm]
> > 
> >...
> > That's kind of pointless, sorry.  I really need a trace of the imapd
> > process, since that's what's locking up.
> 
> Here's another trace of an imapd process:
Sorry I haven't replied before - I'm on holiday over the other side of
Australia for a week - so we were flying most of yesterday to get here!
> 23237:  poll(0xFFBFB860, 1, 1800000)                    = 1
> 23237:  time()                                          = 1253039218
> 23237:  time()                                          = 1253039218
> 23237:  poll(0xFFBFC858, 1, 1788000)                    = 1
> 23237:  time()                                          = 1253039218
> 23237:  time()                                          = 1253039218
> 23237:  read(0, " 0   l o g o u t\r\n", 4096)           = 10
> 23237:  time()                                          = 1253039218
> 23237:  poll(0xFFBFC858, 1, 1800000)    (sleeping...)
> 
> (At this point the trace paused, and the Telnet session got no response.
> I aborted the Telnet session and the trace continued as follows:)
This is the same path at the same point as last time.  Clearly something
going on!
The pain is, I can't see it - unless it's happening deep inside lib/prot.c
somewhere.  And the strace doesn't tell us anything - it will read up to
4096 characters straight off, and it's getting 10 of them, which should
be enough!
I'm wondering if something in the COMPRESS changes has broken an assuption.
I don't suppose you're able to gdb the process and get a backtrace from
that poll are you?  Sorry to keep asking for more stuff, I just can't
figure out any more than that from the truss output!
 
> 23237:  poll(0xFFBFC858, 1, 1800000)                    = 1
Yeah, so disconnect caused the poll to finish!
And a signal handler had fired, so it knows to close down...
> 23237:  close(11)                                       = 0
> 23237:  unlink("/usr/local/cyrus/data/proc/23237")      = 0
> 23237:  open("/dev/null", O_RDWR)                       = 11
> 23237:  shutdown(0, 0, 1)                               = 0
> 
> So what was it waiting for?
That's a really good question!  It feels to me that it had gone back into
the loop to get some more data.
Hmm - question: do you have zlib support compiled in?
[brong at utility1 ~]$ ldd /usr/cyrus/bin/imapd | grep libz
	libz.so.1 => /usr/lib/libz.so.1 (0x55990000)
Maybe there's a bug in the codepaths for COMPRESS when zlib doesn't
exist (because they're all #ifdef HAVE_ZLIB - maybe things don't work
nicely without it!)
Long shot - but you might want to try it with the attached patch applied.
It rewrites the zlib handling logic quite a lot, though it wasn't originally
designed to deal with your problem!  I'm going to push it to CVS soon.
Bron.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rewrite_zlib.diff
Type: text/x-diff
Size: 6777 bytes
Desc: not available
Url : http://lists.andrew.cmu.edu/pipermail/info-cyrus/attachments/20090917/a006e57c/attachment.bin 
    
    
More information about the Info-cyrus
mailing list