Corruption with delayed expunge and expire annotations

James E. Blair jeblair at berkeley.edu
Mon Sep 15 20:47:32 EDT 2008


We observed some corruption of cyrus.expunge when running 2.3.7 which
we think is still relevant now that we've upgrade to 2.3.12p2 (with
proposed 2.3.13 patches from FastMail) particularly in light of the
fact that the interaction of delayed expunge and expire annotations
seems poorly defined.

The corruption would result in cyrus.expunge files that may have only
had a few bytes or kilobytes of data, but were sparse files with an
apparent size of ~200GB.  In 2.3.7, when cyrus read this file it would
usually segfault.  In 2.3.12p2, cyrus manages to read the file as if
it were valid, and merrily starts writing a cyrus.expunge.NEW file
that is _actually_ 200GB, filling the partition very quickly.

Every user in our system has a spam folder with the expire mailbox
annotation set.  The idea is that spam that has sat in the folder for
more than a user-customizable value (say a week) will be automatically
expired (based on INTERNALDATE -- a local customization).  We also use
delayed expunge, and unusual interactions happen when these two meet.

cyr_expire is the program that performs both of these functions.  In
2.3.7, it was possible for it to call mailbox_expunge with two flags
set.  If an expire annotation for a given mailbox was set:

            expunge_flags |= EXPUNGE_FORCE;

And if delayed expunge is configured:

            expunge_flags |= EXPUNGE_CLEANUP;

During this period, we had several instances of corrupted expunge
files in spam folders -- the only mailboxes where cyr_expire was
dealing with delayed expunge cleanup and expire annotations.

When both of those flags are set, a unique code path through
mailbox_expunge is run.  Essentially, what happens is:

1) cyrus.expunge is opened.  (EXPUNGE_CLEANUP causes this)
2) cyrus.expunge is locked. 
3) cyrus.expunge is mmap-ed.
4) cyrus.expunge.NEW is opened for writing.
5) process_records is called with the expunge_fd argument set.
   This causes messages to be expunged according to the annotation,
   and their records to be written to cyrus.expunge (file).
6) Data from cyrus.expunge (mmap) is copied to cyrus.expunge.NEW (file).
7) process_records is called again, operating on cyrus.expunge (mmap)
   and cyrus.expunge.NEW (file).

There is potential for trouble between steps 5 and 6.  As recently
pointed out by Linus, it's dangerous to mix read/write and mmap
access:

http://kerneltrap.org/mailarchive/linux-kernel/2008/6/18/2162194

But in this case, we are writing to a file that was previously
mmap-ed, and then subsequently accessed.  The potential for corruption
there definitely seems to exist.

It seems that adding an unmap/mmap pair to ensure up to date data
after step 5 and before step 6 should add protection against this
situation.  Specifically, that would be after the first call to
process_records in mailbox_expunge.

In the mean time, cyr_expire has since changed so that it will no
longer call mailbox_expunge with both of those flags set, but (aside
from this bug) there seems no reason why both of those flags should
not be set.  It seems like a sensible way to handle a mailbox that is
subject to both delayed expunge and the expired annotation.

In fact, because cyr_expire only calls mailbox_expunge with only one
of the two flags set, it turns out that the annotation takes
precedence on our spam folders, and is never called with
EXPUNGE_CLEANUP.  We believe this means that the cyrus.expunge file
can grow without bound, and messages that people delete manually from
their spam folders never get removed.

It seems that there are two solutions to this problem:

1) Fix the corruption described above, and adjust cyr_expire so that
   it may once again call mailbox_expunge with both flags.

2) Change cyr_expire so that it calls mailbox_expunge twice in the
   case of an expire annotation on a system with delayed expunge --
   once to expire the messages, a second time to cleanup the expunged
   messages.

We would appreciate some feedback on this analysis, as well as
comments about the right way for cyr_expire to handle this case.

James E. Blair
Principal Email Systems Administrator
UC Berkeley - IST


More information about the Cyrus-devel mailing list