Funding Cyrus High Availability
david.lang at digitalinsight.com
Sun Sep 19 03:52:08 EDT 2004
There are many ways of doing High Availability. This is an attempt to
outline the various methods with the advantages and disadvantages. Ken and
David (and anyne else who has thoughts on this) please feel free to add to
this. I'm attempting to outline them roughly in order of complexity.
1. Active->Slave replication with manual failover
This is where you can configure one machine to output all changes to a
local daemon and another machine to implement the changes that are read
from a local daemon.
simplist implementation, since it makes no assumptions about how you
are going to use it, it also sets no limits on how it is used.
This is the basic functionality that all other variations will need so
it's not wasted work no matter what is done later
allows for multiple slaves from a single master
allows for the propogation traffic pattern to be defined by the
sysadmin (either master directly to all slaves or a tree-like propogation
to save on WAN bandwidth when multiple slaves are co-located
by involving a local daemon at each server there is a lot of
flexibility in exactly how the replication takes place.
for example you could
use netcat as your daemon for instant transmission of the
have a daemon that caches the messages so that if the link
drops the messages are saved
have a daemon that gets an acknowlegement from the far side that
the message got through
have a daemon that batches the messages up and compresses them for
more efficiant transport
have a daemon that delays all messages by a given time period to
give you a way to recover from logical corruption without having to go to
have a daemon that filters the messages (say one that updates
everything except it won't delete any messages so you have a known safe
archive of all messages)
since it makes no assumptions about how you are going to use it, it
also gives you no help in useing it in any particular way
2. Active->Slave replication with automatic failover
This takes #1, limits it to a pair of boxes and through changes to
murder or other parts of cyrus will swap the active/slave status of the
makes setting up of a HA pair of boxes easier
increases availability by decreasing downtime
this functionality can be duplicated without changes to cyrus by the
use of an external HA/cluster software package.
Since this now assumes a particular mode of operation it starts to
limit other uses (for example, if this is implemented as part of murder
then it won't help much if you are trying to replicate to a DR datacenter
several thousand miles away).
Split-brain conditions are the responsibility of cyrus to prevent or
solve. These are fundamentaly hard problems to get right in all cases
3. Active->Slave replication with Slave able to accept client connections
This takes #1 and then further modifies the slave so that requests that
would change the contents of things get relayed to the active box and then
the results of the change get propogated back down before they are visable
to the client.
simulates active/active operation although it does cause longer delays
when clients issue some commands.
use of slaves for local access can reduce the load on the master
resulting in higher performance.
can be cascaded to multiple slaves and multiple tiers of slaves as
in case of problems on the master the slaves can continue to operate as
read-only servers providing degraded service while the master is fixed.
depending on the problem with the master this may be very preferable to
having to re-sync the master or recover from a split-brain situation
more extensive modifications needed to trap all changes and propogate
them up to the master
how does the slave know when the master has implemented the change (so
that it can give the result to the client)
raises questions about the requirement to get confirmation og all
updates before the slave can respond to the client (for example, if a
slave decides to read a message that is flagged as new should the slave
wait until the master confirms that it knows the message has been read
before it gives it to the client, or should it give the message to the
client and not worry if the update fails on the master)
since the slave needs to send updates to the master the latency of the
link between them can become a limiting factor in the performance that
clients see when connecting to the slave
4. #3 with automatic failover
Since #3 supports multiple slaves the number of failover senerios grow
significantly. you have multiple machines that could be the new master and
you have the split-brain senerio to watch out for.
increased availability by decreasing failover time
potentially easier to setup then with external clustering software
runs the risk of breaking some deployment senerios in an attempt to
designate one of the boxes as primary and identify all items in the
datastore that absolutly must not be subject to race conditions between
the two boxes (message UUID for example). In addition to implementing the
replication needed for #1 modify all functions that need to update these
critical pieces of data to update them on the master and let the master
update the other box.
best use of available hardware as the load is split almost evenly
between the boxes.
best availability becouse if there is a failure half of the clients
won't see it at all
significantly more complex then the other options.
behavior during a failure is less obvious
split-brain recovery is not straightforward and if automatic failover
is active the sysadmin will have no option to have things degraded
slightly while a problem is fixed
depending on the implementation this may be very sensitive to network
latency between the machines and could be very suitable for working with
machines in the same datacenter, but worthless for machines thousands of
Take #5 and extend the idea to more then a pair of boxes. this makes the
updates more complex to propogate (they now need to be sent to every other
machine in the cluster)
better load balancing then #5
allows for the ability to have a HA pair in a primary location and a
backup in a remote location (i.e. your main HQ has two boxes, but your
disaster recovery center has one as well)
the complexity goes up significantly when you shift from 2 to n boxes
in a cluster.
the bandwidth required for updates increases by a factor of roughly n!
significantly more split-brain senerios become possible and need to be
while #6 is the ideal option to have it can get very complex
personally I would like to see #1 (with a sample daemon or two to provide
basic functionality and leave the doors open for more creative uses)
followed by #3 while people try and figure out all the problems with #5
there are a lot of senerios that are possible with #1 or #3 that are not
possible with #5 and very little of the work needed to release #1 and #3
as supported options is not work that needs to be done towards #5/6 anyway
(the pieces need to be identified in the code and hooks put in place in
the code at those locations. the details of the hooks will differ slightly
There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.
-- C.A.R. Hoare
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
More information about the Info-cyrus