Expunged emails after upgrade from 2.3.16 to 2.4.5

Bron Gondwana brong at fastmail.fm
Wed Dec 8 05:34:24 EST 2010


On Wed, Dec 08, 2010 at 07:52:53PM +1030, Stephen Carr wrote:
> I assumed that replication went one way master to replica. Now with it 
> syncing in both directions my replica is out of sync with the master in 
> that the number of emails in folders on the master are not the same on 
> the replica.

The replication system tries to correct for a bunch of possible errors.
If there's no matching message on the replica, but there are FUTURE UID
messages on the replica, then the replica is incomplete and broken.

I'm working from a fundamental assumption that the replica is a backup
system designed to come into production if something happens to your
master system rather than a "rolling backup".  I can see some
justification for a rolling backup server which you never intend to use
as a master, and which doesn't honour expunges.  I'd be happy to write
one of those if people see the need.

I don't know how it would handle folder deletes though - it would have
to move them into the DELETED namespace probably, so that it could
handle the same name being created again.
 
> The problem was compounded by having both expunge and delete set to 
> immediate in the master but delayed on the replica.

Shouldn't matter in theory...


Ooooooh... hang on.  Immedate expunge might be totally broken anyway.
I strongly recommend 'default' expunge if you want to get the space
saving of deleting the spool files immediately.


    if (rp->uid <= mailbox->i.last_uid) {
>-------/* Ok, now we need to check if it's just really stale
>------- * (has been cleaned out locally) or an error.
>------- * In the error case we copy back, stale
>------- * we remove from the replica */
>-------if (rp->modseq < mailbox->i.deletedmodseq)
>-------    dlist_num(kaction, "EXPUNGE", rp->uid);
>-------else
>-------    dlist_num(kaction, "COPYBACK", rp->uid);
    }

No, it's OK.  Your tame genius thought of this case :)  It
causes an EXPUNGE event to be sent to the replica, which will
mark the message expunged.

> I have both the master and replica set to have delete  and expunge set 
> to delayed and restarted master on both systems.

Cool, much nicer.

> Now if I delete an email it is reflected in the client (Thunderbird) ie 
> total drops by one and an ls *. on the file system is still the old 
> value that is one more what is reported in Thunderbird BUT unexpunge -l 
> reports no emails to unexpunge.

Are you sure that Thunderbird has actually sent the EXPUNGE and isn't just
ignoring messages with the \Deleted flag when you do that test?

> I also tested it by creating a new folder - moving an email into it all 
> OK in Thunderbird - then deleted it - Thunderbird reports NO email in 
> folder BUT unexpunge -l does not display the deleted email but it is on 
> the file system.

Hmm... testing that now.

root at launde:~# /usr/cyrus/bin/unexpunge -C /tmp/ct-slot2/etc/imapd.conf -l user/foo
UID: 1
    Size: 92
    Sent: Mon Mar  8 00:00:00 2010
    Recv: Mon Mar  8 17:18:11 2010
    Expg: Wed Dec  8 21:29:49 2010
    From: test <test at example.com>
    To  : test <test at example.com>
    Cc  : 
    Bcc : 
    Subj: ""

That looks like it works fine to me.

> What is happening?
> 
> Ignoring replication - why is unexpunge not working?

Because thunderbird doesn't expunge immediately is my guess.

> I then did this moved another 2 emails to the folder and ran
> 
> ipurge -f -d 0 -X user.XXXX.Test
> 
> and unexpunge works showing all 3 emails.
> 
> Maybe I better read the manuals regarding user delete and expunge.
> 
> This upgrade has not been trivial BUT the performance of the server 
> seems to be much faster.

I would hope so!  I've put a lot of work into optimising IO :)
 
> I have users who pop off emails BUT leave the emails on the server and 
> they have thousands - the POP could timeout - since the upgrade it seems 
> to be done in a flash. In one instance 32,000 emails were popped off the 
> server in about 45 minutes by wall clock and 3 minutes in system + user. 
> See below

There was a nasty N^2 bug in pop3d as released in 2.4.0, I fixed it
somewhere along the way based on code review by gnb :)

> I suppose I should report this is as a bug if others have see this problem.

If it's not against a current release I probably wouldn't bother.

Bron.


More information about the Info-cyrus mailing list