cyrus replication : master to replica and replica to master

David Carter dpc22 at cam.ac.uk
Fri Oct 23 10:42:08 EDT 2009


On Fri, 23 Oct 2009, Bron Gondwana wrote:

> I've seen heartbeat get split brain before.  We gave up on it.  We do 
> all our fencing via humans now!  Check the KVM, kick the box, manually 
> run the failover script.

Some of my colleagues have had a lot of grief with Heartbeat going split 
brain. It seems to really be designed for a pair of machines sitting next 
to each other in a rack with a serial link for the heartbeat, rather 
servers installed in a pair of machine rooms three miles apart.

We do manual failover with our Cyrus mailstores: I would rather 1/8th of 
my users had an outage of a couple of hours (and typically just a few 
minutes) than end up with a split brain.

On the one occasion in five years that we did end up with a Cyrus split 
brain (replication failed because of a memory DIMM error and then the 
entire master failed a few minutes later) it was easy enough to fish 
missing messages out of the dead system the following day and reinject 
them using LMTP. Certainly easier than reengineering the entire Cyrus 
mailstore to allow active/active replication.

On Wed, Oct 21, 2009 at 08:45:11PM +0200, David Touzeau wrote:

> I would like to know if it is possible to SET the replica has the master 
> too in order to replicate new mail saved on the replica to the master 
> and vis versa In this case it should be turn to active/active..

We do this to a limited degree: the set of active users on a pair of 
mailstores can be partitioned and bounced back and forth between the two 
servers in a pair. This is mostly useful for load balancing between our 
two machine rooms, or migrating all the users off a master so that we can 
patch and reboot without any user visible downtime.

However this is using my own replication code rather than the branch which 
was rewritten into Cyrus by Ken. I have additional safeguards to stop 
sync_client from overwriting the master data in a pair (which has only 
ever happened because of stupidity on my part when testing).

I've never used the standard replication code in Cyrus other than to 
backport (sideport?) additional features such as CONDSTORE and GUID 
support. Given the grief Fastmail had with the early Cyrus replication 
code I think that I'm rather glad about this.

Every once in a while I think about moving to standard Cyrus replication. 
Unfortunately there are a lot of warts that I really don't like. It is 
much easier to just drop my own replication code onto new versions of 
Cyrus (typically < 5 minutes work each time). That was one of my original 
design objectives.

-- 
David Carter                             Email: David.Carter at ucs.cam.ac.uk
University Computing Service,            Phone: (01223) 334502
New Museums Site, Pembroke Street,       Fax:   (01223) 334679
Cambridge UK. CB2 3QH.


More information about the Info-cyrus mailing list