timeouts when connecting to imap server
Timo Veith
tv at rz-zw.fh-kl.de
Fri Dec 1 12:29:55 EST 2006
Am Donnerstag 30 November 2006 15:35 schrieb Jorey Bump:
> Timo Veith wrote:
> > I am still having the problem, isn't there anyone who has a hint for
> > me? I changed the io scheduler from cfq to deadline, raised file
> > descriptor limit to 300000 and still have no betterment. :(
>
> Just a thought, but can you try switching to a 2.4.x kernel? The 2.6
> series seems to suffer from gremlins like this once in a while.
I am pretty sure that the master daemon is now running with a 350000 file
desciptor limit. At least this is written to the log file.
Dec 1 09:46:18 post master[3078]: setrlimit: Unable to set file
descriptors limit to -1: Operation not permitted
Dec 1 09:46:18 post master[3078]: retrying with 350000 (current max)
Dec 1 09:46:18 post master[3078]: process started
Because the timeouts still remain, I don't think that it is a file
descriptor limit problem. 350000 should be way enough, shouldn't it?
> > I installed the nagios check on the mail server itself to exclude
> > network problems and checked the imap service on both interfaces
> > (localhost and on the external ip). I also did this in parallel and
> > noticed that when a timeout happens on one interface it is not
> > constraining a timeout on the other interface, too.
> >
> > How can I tell why it sometimes takes so long until a imap process
> > responds?
>
> I had a similar problem that seemed to disappear when I disabled IDLE
> on the client, but does nagios use IDLE? When you look at the users
> that are affected, does any particular client or setting stand out?
Hmm, any clients that stand out ... most of the time it's squirrelmail,
that takes so long until you are logged in. Sometimes squirrelmail even
says "no connection to imap server". Maybe there is a timeout somewhere
too.
I haven't looked into the code of that nagios check though, but I don't
think that it is using IDLE. It is just connecting to the imap port,
waits for the server banner, disconnects and measures that time. I think
it is pretty much the same as doing telnet 127.0.0.1 10143. Sometimes if
I issue that command, I immediately get the service banner and sometimes
only this
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
And I can wait and wait ...
This is the point where I start wondering what the hell cyrus is doing now
that it takes so long to answer.
I started the master daemon with -D and export CYRUS_VERBOSE=1, but I saw
no log messages that helped me. At least they don't sound critical to me.
Is there anything I should be looking for?
Oh and I tried it with the idle service disabled in cyrus.conf but it
didn't make a difference. Isn't it enough to disable it there? Must I
recompile it without the idled option? But I really would like to stay
with idled enabled.
Could it be that the compile time optimazations are to be blamed? This is
what I have used for gcc (3.3.6):
CFLAGS="-march=nocona -O3 -pipe -fomit-frame-pointer -mmmx -msse -msse2 -mfpmath=sse"
Desperately
Timo
More information about the Info-cyrus
mailing list