Cyrus file locking issues
Scott Adkins
adkinss at ohio.edu
Thu Feb 20 11:05:51 EST 2003
We are running into issues on our system where IMAP and LMTP processes
become stuck and start to stack up and never go away. The IMAP processes
aren't so bad, but LMTP definitely is... Investigating the problem with
truss and lsof shows that all of the *stuck* processes (the ones that
seem to be old and not doing anything but sleeping) are all blocked on
a single user waiting for a lock to become free. Usually, it is the
"cyrus.header" file that the processes are waiting on. If I move the
file out of the way and reconstruct the user's imap folder, some new
LMTP processes pop up to deliver mail, but get blocked on a file lock
for "cyrus.index". I managed to get one user working by moving it out
of the way as well and doing another reconstruct. All the other times,
that doesn't work either... I still can't get mail delivered to that
user's INBOX.
I have tried various things. Using lsof, I can find out which process
currently has the write lock on the file. Killing it (with -TERM) gets
rid of the process, but the write lock moves to another process and
gets stuck there... Killing all the IMAP processes dealing with that
file lock doesn't help, as the write lock finally moves to one of the
LMTP processes and gets stuck. Ultimately, I have to shutdown the Cyrus
server, and when I do that, the locks are freed and things are okay.
When I bring the Cyrus server back up again, everything is happy and
that user starts receiving email again.
Each time I have seen this problem pop up, it has been a different user.
We also have 5 different partitions defined, all to different file domains
(basically, different disks) and I have seen this problem occur on several
different partitions at this point, so I don't believe a particular disk
is going bad or anything like that.
This is a Tru64 5.1 Cluster running on Compaq Alpha hardware. The Cyrus
server is 2.0.16. The duplicate delivery database has not been compiled
in (we had all kinds of locking problems with BerkeleyDB), so this means
that LMTP is not linked with BerkeleyDB or Pthreads (yay!). We also use
flat file format for the mailboxes.db, which also means that the rest of
the cyrus processes are not linked with BerkeleyDB or Pthreads as well.
The reason I mention that is that when trussing one of the stuck procs,
it looks like it is stuck within a pthread type lock. I know the process
isn't threaded, so this indicates that the lock is within the kernel,
which makes sence with regards to flock() and it works (I believe).
Finally, before anyone says anything, yes, I know we should be running
2.0.17 or higher, but we can't make that kind of upgrade with all the
changes made to 2.0.17 and the custom changes we made to 2.0.16 during
the school year without careful testing and careful planning. We have a
very *very* busy email system, and it is the main email system for the
whole university, so the less we do to it, the better. We will upgrade
to the latest greatest version when summer gets here, which seems to
get here pretty quick anyways...
So, has anyone else seen this problem? If this was a recognized problem
and has since been fixed, which version of the Cyrus server was this
finally fixed in?
Thanks!
Scott
--
+-----------------------------------------------------------------------+
Scott W. Adkins http://www.cns.ohiou.edu/~sadkins/
UNIX Systems Engineer mailto:adkinss at ohio.edu
ICQ 7626282 Work (740)593-9478 Fax (740)593-1944
+-----------------------------------------------------------------------+
PGP Public Key available at http://www.cns.ohiou.edu/~sadkins/pgp/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 231 bytes
Desc: not available
Url : https://lists.andrew.cmu.edu/mailman/private/info-cyrus/attachments/20030220/1288af7d/attachment.bin
More information about the Info-cyrus
mailing list