v2.3.15 IMAP connection hangs

Bron Gondwana brong at fastmail.fm
Wed Sep 16 11:12:13 EDT 2009


On Tue, Sep 15, 2009 at 02:35:48PM -0400, Rosenbaum, Larry M. wrote:
> > From: Bron Gondwana [mailto:brong at fastmail.fm]
> > 
> >...
> > That's kind of pointless, sorry.  I really need a trace of the imapd
> > process, since that's what's locking up.
> 
> Here's another trace of an imapd process:

Sorry I haven't replied before - I'm on holiday over the other side of
Australia for a week - so we were flying most of yesterday to get here!

> 23237:  poll(0xFFBFB860, 1, 1800000)                    = 1
> 23237:  time()                                          = 1253039218
> 23237:  time()                                          = 1253039218
> 23237:  poll(0xFFBFC858, 1, 1788000)                    = 1
> 23237:  time()                                          = 1253039218
> 23237:  time()                                          = 1253039218
> 23237:  read(0, " 0   l o g o u t\r\n", 4096)           = 10
> 23237:  time()                                          = 1253039218
> 23237:  poll(0xFFBFC858, 1, 1800000)    (sleeping...)
> 
> (At this point the trace paused, and the Telnet session got no response.
> I aborted the Telnet session and the trace continued as follows:)

This is the same path at the same point as last time.  Clearly something
going on!

The pain is, I can't see it - unless it's happening deep inside lib/prot.c
somewhere.  And the strace doesn't tell us anything - it will read up to
4096 characters straight off, and it's getting 10 of them, which should
be enough!

I'm wondering if something in the COMPRESS changes has broken an assuption.

I don't suppose you're able to gdb the process and get a backtrace from
that poll are you?  Sorry to keep asking for more stuff, I just can't
figure out any more than that from the truss output!
 
> 23237:  poll(0xFFBFC858, 1, 1800000)                    = 1

Yeah, so disconnect caused the poll to finish!

And a signal handler had fired, so it knows to close down...

> 23237:  close(11)                                       = 0
> 23237:  unlink("/usr/local/cyrus/data/proc/23237")      = 0
> 23237:  open("/dev/null", O_RDWR)                       = 11
> 23237:  shutdown(0, 0, 1)                               = 0
> 
> So what was it waiting for?

That's a really good question!  It feels to me that it had gone back into
the loop to get some more data.

Hmm - question: do you have zlib support compiled in?

[brong at utility1 ~]$ ldd /usr/cyrus/bin/imapd | grep libz
	libz.so.1 => /usr/lib/libz.so.1 (0x55990000)

Maybe there's a bug in the codepaths for COMPRESS when zlib doesn't
exist (because they're all #ifdef HAVE_ZLIB - maybe things don't work
nicely without it!)

Long shot - but you might want to try it with the attached patch applied.
It rewrites the zlib handling logic quite a lot, though it wasn't originally
designed to deal with your problem!  I'm going to push it to CVS soon.

Bron.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: rewrite_zlib.diff
Type: text/x-diff
Size: 6777 bytes
Desc: not available
Url : http://lists.andrew.cmu.edu/pipermail/info-cyrus/attachments/20090917/a006e57c/attachment.bin 


More information about the Info-cyrus mailing list