Replication GUID mismatch copyback

Bron Gondwana brong at fastmail.fm
Mon Dec 14 23:57:58 EST 2009


Hi All,

At heart I'm a pragmatist, and when it's a choice between purity or
shit-that-works, I'll choose the later every time.

Particularly when it comes to not losing messages that were successfully
delivered if a machine crashes and we wind up reversing the replication
direction for a bit.

So - here's some extentions to the sync protocol, and some code that
uses them.

the extentions are two commands:

* EXPUNGEFORCE - which just calls mailbox_expunge with the EXPUNGE_FORCE
  flag set, so the message gets removed immediately.  Not strictly
  necessary, but neat.

* FETCH - which fetches the content of a message which exists on the
  replica.

Now - there's no attempt to find a matching GUID on the sync_client end
and avoid copying the message.  The point is, this is rare.  Basicaly we
deal with two cases:

1) the same UID has been allocated to two different messages, one on each
   end.

2) a UID has been allocated on the replica only, and last_uid is still
   lower on the master.

Unfortunately, we can't detect and handle this case:

3) the same UID was allocated at both ends, but it's been EXPUNGEd at one
   end.  This is because we don't read cyrus.expunge as well.  Much as I'd
   love to, I'll leave that for later.  Grab the low hanging fruit first!


There are two patches attached.  They're independent.  The first patch adds
these commands and performs the necessary copy-back logic.  If there's a GUID
mismatch, it will append first the copy on the client (master) then the copy
on the server (replica) to the current mailbox, preserving all flags except
\Seen.  \Seen is too tricky.  It then EXPUNGE_FORCE removes the original copy
at each end.  As far as users are concerned, the message will become UNSEEN
again, and might sort differently (though internaldate is preserved) - but
they get both copies without confusing any potential IMAP clients that talked
to either end in the past.

The second patch takes advantage of modseq ordering to detect if a change was
made to flags on the replica and copies them back.  It's a bit dodgy about
ordering because you can't actually tell how many modseqs will be used post-split,
but then a last_modified solution wouldn't work if the clocks were out of sync,
so it's better than nothing.  It only matters if there were actually changes made
to the record anyway.

Any comments about implementation, or indeed if you think there's a better way
to make crashes not lose stuff?

Bron.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Replication-copy-messages-back-on-GUID-mismatch.patch
Type: text/x-diff
Size: 25117 bytes
Desc: not available
Url : http://lists.andrew.cmu.edu/pipermail/cyrus-devel/attachments/20091215/e5e0907e/attachment-0002.bin 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-Copy-flags-back-from-replica-if-it-has-a-higher-mods.patch
Type: text/x-diff
Size: 1827 bytes
Desc: not available
Url : http://lists.andrew.cmu.edu/pipermail/cyrus-devel/attachments/20091215/e5e0907e/attachment-0003.bin 


More information about the Cyrus-devel mailing list