Cyrus Replication Rewrite

ktm at rice.edu ktm at rice.edu
Fri Jun 29 08:54:30 EDT 2012


On Fri, Jun 29, 2012 at 10:22:17AM +0200, Bron Gondwana wrote:
> Yes, again ;)  Having written a rewrite once, I want to do it again.
> 
> First - WHY???
> 
> a) make master/master replication safe
> b) make "pull" replication available
> c) make replication and backup identical
> d) make offline replication (diff shipping)
> e) make replication available to regular users
> 
> OK, how?
> 
> a) add modseqs and tombstones everywhere
> 
> * "meta" database containing:
>   1) contents of user.sub
>   2) contents of user.seen
>   3) names and sha1 (guid)s of all sieve files
>   4) per-user modseq and uidvalidity stuff
> 
> Complete with modseqs and expiry of records on everything.  The mailboxes.db
> and annotations.db will have the same things too - records get a modseq and
> a "isdeleted" flag. 
> 
> Let's consider "per user" for a bit.  Where there's currently a "quotaroot",
> there's no equivalent "userroot".  Users are special in various ways, but
> this needs to be made more clear.  Non-user folders (shared hierarchies)
> will need to be handled too.  They may even need nesting.
> 
> b,c,d) the main thing here is using (a) to ship just changes.  Replication of
>        any piece of data will always include:
>   1) a start point - either modseq 0, blank CRC or a particular modseq,
>      CRC algorithm and CRC.  Given that data, you either know immediately
>      that the destination is unchanged-since, or that you need to return an
>      error and your own "changed since that point" for the remote end to
>      integrate before continuing.
> 
> Because of this speculative sending, it's never necessary to lock both ends
> for any change.  It's never necessary to be online.  A change either applies
> cleanly, or gets rejected.  Even "full mailbox sync" where you need to fetch
> all metadata from the other end to resolve a split brain will result in a
> final state which should apply cleanly to the other end.
> 
> Speaking of which - changes are always "legal" - any attempt to reduce a
> modseq, change a uidvalidity, unexpunge a UID - these are all rejected.
> The uidvalidity is kind of a special case - it's a delete and recreate of
> a mailbox, and is only legal with the UIDVALIDITY increasing.
> 
> Which leads to:
> 
> e) with legality checking and ACL checking, this interface could be exposed
>    to regular users as well.  Obviously, they could only make permitted
>    changes to mailboxes they have permissions on - but along with 'b' where
>    the "master" actually has no knowledge of the replica, a remote machine
>    could connect and poll for changes.
> 
> At which point you could add:
> 
> f) IDLE style notify on a replication connection, so you can connect, sync,
>    and ask to monitor a set of mailboxes for changes and be notified
>    (as sync change events) when they happen.
> 
> ===================
> 
> Ok, that's a nice broad set of goals, how do we get there?
> 
> 1) move sync_server into imapd as a separate command: CYR_SYNC which reads and
>    replies with DLISTs.
> 2) write a 'backup' command line tool as a client to imapd which connects and
>    requests the bits it needs.
> 3) change sync_client to cache remote state for speculative initial commands
>    (no initial query required before shipping changes)
> 
> For efficiency, we also need to re-add sync_crc caching.  This can either be
> done in something like statuscache, or put back in the index header.  I do
> like the idea of statuscache actually, because it reduces the IO hit of sync
> even further - and the CPU hit too.  Some of our servers are currently
> overloaded on CPU due to the continual CRC recalculation.
> 
> One complexity with sync_crc is that annotations don't currently calculate
> sync_crc changes as they are written.  That will need to be fixed before we
> can really cache them.
> 
> The rest happens later... permissions and such :)
> 
> Bron.
> -- 
>   Bron Gondwana
>   brong at fastmail.fm
> 
Hi Bron,

I do not know if it is even possible, but can the implementation support
replication FROM an older release of Cyrus. This would really, really
help with the upgrade process. We are currently stuck on 2.3.x because
of the upgrade resource needs even though we have an essentially idle
replica that could easily handle the I/O of the upgrade, if only it
could be done on the replica while still streaming from the primary. I
know the ship has sailed for 2.3 -> 2.4 but it would be really nice
to have that feature going forward.

Regards,
Ken
> 


More information about the Cyrus-devel mailing list