timeouts when connecting to imap server

Timo Veith tv at rz-zw.fh-kl.de
Fri Dec 1 12:29:55 EST 2006


Am Donnerstag 30 November 2006 15:35 schrieb Jorey Bump:
> Timo Veith wrote:
> > I am still having the problem, isn't there anyone who has a hint for
> > me? I changed the io scheduler from cfq to deadline, raised file
> > descriptor limit to 300000 and still have no betterment. :(
>
> Just a thought, but can you try switching to a 2.4.x kernel? The 2.6
> series seems to suffer from gremlins like this once in a while.

I am pretty sure that the master daemon is now running with a 350000 file 
desciptor limit. At least this is written to the log file.

Dec  1 09:46:18 post master[3078]: setrlimit: Unable to set file 
descriptors limit to -1: Operation not permitted
Dec  1 09:46:18 post master[3078]: retrying with 350000 (current max)
Dec  1 09:46:18 post master[3078]: process started

Because the timeouts still remain, I don't think that it is a file 
descriptor limit problem. 350000 should be way enough, shouldn't it?

> > I installed the nagios check on the mail server itself to exclude
> > network problems and checked the imap service on both interfaces
> > (localhost and on the external ip). I also did this in parallel and
> > noticed that when a timeout happens on one interface it is not
> > constraining a timeout on the other interface, too.
> >
> > How can I tell why it sometimes takes so long until a imap process
> > responds?
>
> I had a similar problem that seemed to disappear when I disabled IDLE
> on the client, but does nagios use IDLE? When you look at the users
> that are affected, does any particular client or setting stand out?

Hmm, any clients that stand out ... most of the time it's squirrelmail, 
that takes so long until you are logged in. Sometimes squirrelmail even 
says "no connection to imap server". Maybe there is a timeout somewhere 
too.

I haven't looked into the code of that nagios check though, but I don't 
think that it is using IDLE. It is just connecting to the imap port, 
waits for the server banner, disconnects and measures that time. I think 
it is pretty much the same as doing telnet 127.0.0.1 10143. Sometimes if 
I issue that command, I immediately get the service banner and sometimes 
only this

Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.

And I can wait and wait ...

This is the point where I start wondering what the hell cyrus is doing now 
that it takes so long to answer.

I started the master daemon with -D and export CYRUS_VERBOSE=1, but I saw 
no log messages that helped me. At least they don't sound critical to me. 
Is there anything I should be looking for?

Oh and I tried it with the idle service disabled in cyrus.conf but it 
didn't make a difference. Isn't it enough to disable it there? Must I 
recompile it without the idled option? But I really would like to stay 
with idled enabled.

Could it be that the compile time optimazations are to be blamed? This is 
what I have used for gcc (3.3.6):
CFLAGS="-march=nocona -O3 -pipe -fomit-frame-pointer -mmmx -msse -msse2 -mfpmath=sse"

Desperately
Timo


More information about the Info-cyrus mailing list