2.3.8 traditional murder POP3 proxy

Shawn Nock nock at email.arizona.edu
Thu Oct 4 19:37:19 EDT 2007

Hash: SHA1

We rolled out 2.3.8 a few months ago (configuration specifics are at the
end of the message)... and for the most part things have gone very well
(After Bron solved our Reiserfs deadlock problem... he finds all the
good bugs!)

A persistent problem in the new config is Outlook 2003 clients using POP3.

Short summary: Outlook 2003 (with all available updates) intermittently
(about est. 10-15% of sessions) gets confused and, in turn, confuses our
frontend. Client software in three confirmed cases is Outlook 2003 SP2
using POP3 SSL on alternate port. Failure rate seems not be related
statistically to message size, content, or total message count.

Protocol logging reveals that the client successfully connects and list
messages in the box. The client then starts receiving messages. In the
middle of a message the session stalls. Pop3d sits in select() waiting
for input to be available on the socket to the client, the client (also
apparently waiting for input) reaches timeout and bails on the
connection. The frontend keeps the backend connection to the backend
alive for 10min (ala RFC) even after the client gives up.

It seems clear that the "problem" is an outlook bug. Manually unlocking
the box and "send/receiving" w/ outlook will cause the problem again (at
the same point in the same message). If outlook is restarted however,
the next session will function successfully.

My real question to the cyrus list (surprisingly, it isn't "can you fix
outlook"). Is two fold:

1. Looking at the code, if the client side of bitpipe is closed... I
can't see a scenario where the backend of the bitpipe is kept alive. The
clean up code seems pretty much bullet proof, yet the behavior I am
seeing is that the client side of bitppipe is closed and the connection
to the backend is left open until POP3 timeout (what I see is pop3d on
the backend waiting on select(). What could cause this? (*note* I doubt
seriously that resolving this question will do anything to ease the
symptoms of this bug in outlook).

2. Are other large sites using a (traditional) murder configuration
seeing problems like this? I'd imagined that a bug this annoying in
Outlook, combined with the large number of Cyrus-imapd deployments,
would have raised quite a few alarms... but in my searches I find no
mention of this issue.

Any happy/ideas/confirmation/commiseration would be appreciated,

Cyrus 2.3.8 w/ some Fastmail.fm patchs
(fast index iterator, statuscache, command timer)
Skiplist for all db
1 mupdate master frontend
1 mupdate slave frontend
3 backends (4-5x mail partitions per backend @ 250G ea.)

Sample imapd.conf (relevant bits; frontend, but the backends are very
similarly configured):

# Basic Config
configdirectory: /cyrus_config
defaultpartition: default
partition-default: /cyrus_mboxes/default
servername: daytona

sendmail: /usr/lib/sendmail
singleinstancestore: yes
duplicatesuppression: yes
quotawarn: 85
timeout: 60
poptimeout: 10
imapidresponse: no
maxmessagesize: 52428800
postmaster: postmaster
sieve_maxscriptsize: 32
sieve_maxscripts: 1
imapidlepoll: 120
munge8bit: no
username_tolower: 1
allowplaintext: yes
allowusermoves: 1
expunge_mode: delayed

# Namespace stuff
hashimapspool: true
fulldirhash: true
unixhierarchysep: yes
altnamespace: yes

tls_cert_file: /cyrus_config/email_verisign_2007.crt
tls_key_file: /cyrus_config/email_verisign_2007.key
tls_ca_file: /cyrus_config/verisign.ca.pem
tls_session_timeout: 0
imap_tls_request_cert: 0
pop3_tls_request_cert: 0

# Extras
statuscache: 1
statuscache_db: skiplist
duplicate_db: skiplist
#tlscache_db: skiplist

- --
Shawn Nock (OpenPGP: 0x5E377505)
Unix Systems Group; UITS
University of Arizona
nock at email.arizona.edu
Version: GnuPG v1.4.7 (GNU/Linux)


More information about the Cyrus-devel mailing list