cyrus chokes after few logins

Martin G.H. Minkler minkler at artegic.de
Fri Oct 13 18:17:15 EDT 2006


Alohá!

First off: Sorry für crossposting this but I think it hast to do with
both, cyrus and SASL.

Second off: Beware, long post! (Although a good bit is just logs) :-)

Abstract:
Cyrus or sasl choke after a few POP3 logins following quickly after one 
another despite high prefork values.

The Problem:
A relatively aggressive POP3 retrieval plugin for exchange (GFI 
Mailessentials) polls 65 accounts at the same time. The logins for the 
first few accounts succeed but the remaining tries time out without any 
trace, leaving the TCP connection in a CLOSE_WAIT state.

The Test:
Using getmail from another sarge machine to pop 65 blank accounts in one 
go I can reproduce the success of approx. 7 logins within the first 
second, the remaining take close to the timeout limit or do time out.
Please see the timestamps of the following log:

epsilon:~# tail -f /var/log/mail.log |grep testuser
Oct 13 23:48:16 localhost cyrus/pop3d[18228]: login: 
[IP_of_querying_server] testuser1 plaintext
Oct 13 23:48:17 localhost cyrus/pop3d[21595]: login: 
[IP_of_querying_server] testuser2 plaintext
Oct 13 23:48:17 localhost cyrus/pop3d[25583]: login: 
[IP_of_querying_server] testuser3 plaintext
Oct 13 23:48:17 localhost cyrus/pop3d[14381]: login: 
[IP_of_querying_server] testuser4 plaintext
Oct 13 23:48:17 localhost cyrus/pop3d[1144]: login: 
[IP_of_querying_server] testuser5 plaintext
Oct 13 23:48:17 localhost cyrus/pop3d[11874]: login: 
[IP_of_querying_server] testuser6 plaintext
Oct 13 23:48:17 localhost cyrus/pop3d[21571]: login: 
[IP_of_querying_server] testuser7 plaintext
Oct 13 23:48:17 localhost cyrus/pop3d[27250]: login: 
[IP_of_querying_server] testuser8 plaintext
Oct 13 23:48:17 localhost cyrus/pop3d[5194]: login: 
[IP_of_querying_server] testuser9 plaintext
Oct 13 23:48:17 localhost cyrus/pop3d[19623]: login: 
[IP_of_querying_server] testuser10 plaintext
Oct 13 23:49:15 localhost cyrus/pop3d[12990]: login: 
[IP_of_querying_server] testuser11 plaintext
Oct 13 23:49:37 localhost cyrus/pop3d[19480]: login: 
[IP_of_querying_server] testuser12 plaintext
Oct 13 23:49:37 localhost cyrus/pop3d[5285]: login: 
[IP_of_querying_server] testuser13 plaintext
Oct 13 23:51:32 localhost cyrus/pop3d[22249]: login: 
[IP_of_querying_server] testuser14 plaintext
Oct 13 23:51:32 localhost cyrus/pop3d[35]: login: 
[IP_of_querying_server] testuser15 plaintext
Oct 13 23:52:37 localhost cyrus/pop3d[30733]: login: 
[IP_of_querying_server] testuser16 plaintext
Oct 13 23:52:54 localhost cyrus/pop3d[13558]: login: 
[IP_of_querying_server] testuser17 plaintext
Oct 13 23:52:54 localhost cyrus/pop3d[4925]: login: 
[IP_of_querying_server] testuser18 plaintext
Oct 13 23:53:17 localhost cyrus/pop3d[20686]: login: 
[IP_of_querying_server] testuser19 plaintext
Oct 13 23:53:33 localhost cyrus/pop3d[21112]: login: 
[IP_of_querying_server] testuser20 plaintext
Oct 13 23:53:59 localhost cyrus/pop3d[24909]: login: 
[IP_of_querying_server] testuser21 plaintext
Oct 13 23:54:00 localhost cyrus/pop3d[16212]: login: 
[IP_of_querying_server] testuser22 plaintext
Oct 13 23:54:22 localhost cyrus/pop3d[23280]: login: 
[IP_of_querying_server] testuser23 plaintext
Oct 13 23:55:42 localhost cyrus/pop3d[22569]: login: 
[IP_of_querying_server] testuser24 plaintext
Oct 13 23:55:42 localhost cyrus/pop3d[1855]: login: 
[IP_of_querying_server] testuser25 plaintext
Oct 13 23:56:12 localhost cyrus/pop3d[11290]: login: 
[IP_of_querying_server] testuser26 plaintext
Oct 13 23:56:52 localhost cyrus/pop3d[1508]: login: 
[IP_of_querying_server] testuser27 plaintext
Oct 13 23:56:52 localhost cyrus/pop3d[11878]: login: 
[IP_of_querying_server] testuser28 plaintext
Oct 13 23:57:16 localhost cyrus/pop3d[8838]: login: 
[IP_of_querying_server] testuser29 plaintext
Oct 13 23:58:18 localhost cyrus/pop3d[11535]: login: 
[IP_of_querying_server] testuser30 plaintext
Oct 13 23:58:18 localhost cyrus/pop3d[20806]: login: 
[IP_of_querying_server] testuser31 plaintext
Oct 13 23:58:48 localhost cyrus/pop3d[15110]: login: 
[IP_of_querying_server] testuser32 plaintext
Oct 13 23:59:20 localhost cyrus/pop3d[27262]: login: 
[IP_of_querying_server] testuser33 plaintext
Oct 13 23:59:53 localhost cyrus/pop3d[28168]: login: 
[IP_of_querying_server] testuser34 plaintext
Oct 13 23:59:53 localhost cyrus/pop3d[5339]: login: 
[IP_of_querying_server] testuser35 plaintext
Oct 14 00:01:18 localhost cyrus/pop3d[4315]: login: 
[IP_of_querying_server] testuser36 plaintext
Oct 14 00:01:57 localhost cyrus/pop3d[8924]: login: 
[IP_of_querying_server] testuser37 plaintext
Oct 14 00:01:57 localhost cyrus/pop3d[29483]: login: 
[IP_of_querying_server] testuser38 plaintext
Oct 14 00:03:29 localhost cyrus/pop3d[9070]: login: 
[IP_of_querying_server] testuser39 plaintext
Oct 14 00:03:30 localhost cyrus/pop3d[29472]: login: 
[IP_of_querying_server] testuser40 plaintext
Oct 14 00:04:16 localhost cyrus/pop3d[6504]: login: 
[IP_of_querying_server] testuser41 plaintext
Oct 14 00:04:16 localhost cyrus/pop3d[27329]: login: 
[IP_of_querying_server] testuser42 plaintext
Oct 14 00:04:38 localhost cyrus/pop3d[32760]: login: 
[IP_of_querying_server] testuser43 plaintext
Oct 14 00:05:00 localhost cyrus/pop3d[11606]: login: 
[IP_of_querying_server] testuser44 plaintext
Oct 14 00:05:35 localhost cyrus/pop3d[26602]: login: 
[IP_of_querying_server] testuser45 plaintext
Oct 14 00:05:35 localhost cyrus/pop3d[5887]: login: 
[IP_of_querying_server] testuser46 plaintext
Oct 14 00:06:17 localhost cyrus/pop3d[2847]: login: 
[IP_of_querying_server] testuser47 plaintext
Oct 14 00:06:36 localhost cyrus/pop3d[8207]: login: 
[IP_of_querying_server] testuser48 plaintext
Oct 14 00:07:41 localhost cyrus/pop3d[4035]: login: 
[IP_of_querying_server] testuser49 plaintext
Oct 14 00:07:42 localhost cyrus/pop3d[19858]: login: 
[IP_of_querying_server] testuser50 plaintext
Oct 14 00:08:39 localhost cyrus/pop3d[3667]: login: 
[IP_of_querying_server] testuser51 plaintext
Oct 14 00:08:39 localhost cyrus/pop3d[25244]: login: 
[IP_of_querying_server] testuser52 plaintext
Oct 14 00:09:06 localhost cyrus/pop3d[29777]: login: 
[IP_of_querying_server] testuser53 plaintext
Oct 14 00:09:37 localhost cyrus/pop3d[842]: login: 
[IP_of_querying_server] testuser54 plaintext
Oct 14 00:10:27 localhost cyrus/pop3d[5103]: login: 
[IP_of_querying_server] testuser55 plaintext
Oct 14 00:11:03 localhost cyrus/pop3d[28995]: login: 
[IP_of_querying_server] testuser56 plaintext
Oct 14 00:11:34 localhost cyrus/pop3d[12469]: login: 
[IP_of_querying_server] testuser57 plaintext
Oct 14 00:11:34 localhost cyrus/pop3d[25816]: login: 
[IP_of_querying_server] testuser58 plaintext
Oct 14 00:12:01 localhost cyrus/pop3d[29592]: login: 
[IP_of_querying_server] testuser59 plaintext
Oct 14 00:12:31 localhost cyrus/pop3d[28422]: login: 
[IP_of_querying_server] testuser60 plaintext
Oct 14 00:13:10 localhost cyrus/pop3d[18671]: login: 
[IP_of_querying_server] testuser61 plaintext
Oct 14 00:13:10 localhost cyrus/pop3d[23530]: login: 
[IP_of_querying_server] testuser62 plaintext
Oct 14 00:13:36 localhost cyrus/pop3d[18888]: login: 
[IP_of_querying_server] testuser63 plaintext
Oct 14 00:14:08 localhost cyrus/pop3d[5330]: login: 
[IP_of_querying_server] testuser64 plaintext
Oct 14 00:14:08 localhost cyrus/pop3d[15592]: login: 
[IP_of_querying_server] testuser65 plaintext



The Thesis:
Ummmm, is sasldb maybe not suited for a high amount of simultaneous 
requests and a race condition occurs that is resolved only by timeouts?
Then again we do have 50-150 logins per minute under normal 
circumstances - and that's no problem whatsoever.

The setup:

debian sarge cyrus 2.1.18-1+sarge2

epsilon:~/tmp# grep -v \# /etc/cyrus.conf | grep .
START {
         recover         cmd="/usr/sbin/ctl_cyrusdb -r"

         delprune        cmd="/usr/sbin/ctl_deliver -E 3"
         tlsprune        cmd="/usr/sbin/tls_prune"
}
SERVICES {
         imap            cmd="imapd -U 30" listen="imap" prefork=75 
maxchild=250 maxforkrate=0
         imaps           cmd="imapd -s -U 30" listen="imaps" prefork=5 
maxchild=250 maxforkrate=0
         pop3            cmd="pop3d -U 30" listen="pop3" prefork=75 
maxchild=250 maxforkrate=0
         pop3s           cmd="pop3d -s -U 30" listen="pop3s" prefork=1 
maxchild=250 maxforkrate=0
         lmtpunix        cmd="lmtpd" listen="/var/run/cyrus/socket/lmtp" 
prefork=0 maxchild=200
         sieve           cmd="timsieved" listen="localhost:sieve" 
prefork=0 maxchild=500
         notify          cmd="notifyd" 
listen="/var/run/cyrus/socket/notify" proto="udp" prefork=1
}
EVENTS {
         checkpoint      cmd="/usr/sbin/ctl_cyrusdb -c" period=30
         delprune        cmd="/usr/sbin/ctl_deliver -E 3" at=0401
         tlsprune        cmd="/usr/sbin/tls_prune" at=0401
         squatter      cmd="squatter -r *" period=30
}

interesting bits from imapd.conf:

popminpoll: 1
duplicatesuppression: yes
hashimapspool: true
allowplaintext: yes
sasl_mech_list: PLAIN LOGIN
sasl_minimum_layer: 256
sasl_maximum_layer: 256
sasl_pwcheck_method: auxprop
sasl_auxprop_plugin: sasldb
sasl_auto_transition: no

Also noteworthy:
Postfix authenticates users sending mail via rimap (saslauthd) thus also 
accessing sasldb. Not at this time of day though, we haven't had a 
single mail sent in the past 15 min and hardly any pickups either.

Looking at the imapd.conf just now does make me wonder why sasl even 
allows pop3 plaintext logins without TLS with sasl_minimum_layer set to 
256... Then again I never really understood how the sasl-layer figures 
are calculated, does anyone have a hint for a FM I could R? The manpage 
for imapd.conf doesn't really tell.

As You can see from the timestamps this is ruining my weekend already, 
I'm at the brink of kicking the server really really hard and any input 
would be greatly appreciated. If I can provide any more info...

TIA

Martin


More information about the Info-cyrus mailing list