cyrus replication : master to replica and replica to master
David Carter
dpc22 at cam.ac.uk
Fri Oct 23 10:42:08 EDT 2009
On Fri, 23 Oct 2009, Bron Gondwana wrote:
> I've seen heartbeat get split brain before. We gave up on it. We do
> all our fencing via humans now! Check the KVM, kick the box, manually
> run the failover script.
Some of my colleagues have had a lot of grief with Heartbeat going split
brain. It seems to really be designed for a pair of machines sitting next
to each other in a rack with a serial link for the heartbeat, rather
servers installed in a pair of machine rooms three miles apart.
We do manual failover with our Cyrus mailstores: I would rather 1/8th of
my users had an outage of a couple of hours (and typically just a few
minutes) than end up with a split brain.
On the one occasion in five years that we did end up with a Cyrus split
brain (replication failed because of a memory DIMM error and then the
entire master failed a few minutes later) it was easy enough to fish
missing messages out of the dead system the following day and reinject
them using LMTP. Certainly easier than reengineering the entire Cyrus
mailstore to allow active/active replication.
On Wed, Oct 21, 2009 at 08:45:11PM +0200, David Touzeau wrote:
> I would like to know if it is possible to SET the replica has the master
> too in order to replicate new mail saved on the replica to the master
> and vis versa In this case it should be turn to active/active..
We do this to a limited degree: the set of active users on a pair of
mailstores can be partitioned and bounced back and forth between the two
servers in a pair. This is mostly useful for load balancing between our
two machine rooms, or migrating all the users off a master so that we can
patch and reboot without any user visible downtime.
However this is using my own replication code rather than the branch which
was rewritten into Cyrus by Ken. I have additional safeguards to stop
sync_client from overwriting the master data in a pair (which has only
ever happened because of stupidity on my part when testing).
I've never used the standard replication code in Cyrus other than to
backport (sideport?) additional features such as CONDSTORE and GUID
support. Given the grief Fastmail had with the early Cyrus replication
code I think that I'm rather glad about this.
Every once in a while I think about moving to standard Cyrus replication.
Unfortunately there are a lot of warts that I really don't like. It is
much easier to just drop my own replication code onto new versions of
Cyrus (typically < 5 minutes work each time). That was one of my original
design objectives.
--
David Carter Email: David.Carter at ucs.cam.ac.uk
University Computing Service, Phone: (01223) 334502
New Museums Site, Pembroke Street, Fax: (01223) 334679
Cambridge UK. CB2 3QH.
More information about the Info-cyrus
mailing list