Funding Cyrus High Availability

Sun Sep 19 03:52:08 EDT 2004

There are many ways of doing High Availability. This is an attempt to 
outline the various methods with the advantages and disadvantages. Ken and 
David (and anyne else who has thoughts on this) please feel free to add to 
this. I'm attempting to outline them roughly in order of complexity.

1. Active->Slave replication with manual failover

   This is where you can configure one machine to output all changes to a 
local daemon and another machine to implement the changes that are read 
from a local daemon.

   Pro:
    simplist implementation, since it makes no assumptions about how you 
are going to use it, it also sets no limits on how it is used.

    This is the basic functionality that all other variations will need so 
it's not wasted work no matter what is done later

    allows for multiple slaves from a single master

    allows for the propogation traffic pattern to be defined by the 
sysadmin (either master directly to all slaves or a tree-like propogation 
to save on WAN bandwidth when multiple slaves are co-located

    by involving a local daemon at each server there is a lot of 
flexibility in exactly how the replication takes place.
      for example you could
         use netcat as your daemon for instant transmission of the 
messages
         have a daemon that caches the messages so that if the link 
drops the messages are saved
         have a daemon that gets an acknowlegement from the far side that 
the message got through
         have a daemon that batches the messages up and compresses them for 
more efficiant transport
         have a daemon that delays all messages by a given time period to 
give you a way to recover from logical corruption without having to go to 
a backup
         have a daemon that filters the messages (say one that updates 
everything except it won't delete any messages so you have a known safe 
archive of all messages)
         etc

   Con:
    since it makes no assumptions about how you are going to use it, it 
also gives you no help in useing it in any particular way

2. Active->Slave replication with automatic failover

   This takes #1, limits it to a pair of boxes and through changes to 
murder or other parts of cyrus will swap the active/slave status of the 
two boxes

   Pro:
    makes setting up of a HA pair of boxes easier

    increases availability by decreasing downtime

   Con:
    this functionality can be duplicated without changes to cyrus by the 
use of an external HA/cluster software package.

    Since this now assumes a particular mode of operation it starts to 
limit other uses (for example, if this is implemented as part of murder 
then it won't help much if you are trying to replicate to a DR datacenter 
several thousand miles away).

    Split-brain conditions are the responsibility of cyrus to prevent or 
solve. These are fundamentaly hard problems to get right in all cases

3. Active->Slave replication with Slave able to accept client connections

   This takes #1 and then further modifies the slave so that requests that 
would change the contents of things get relayed to the active box and then 
the results of the change get propogated back down before they are visable 
to the client.

   Pro:
    simulates active/active operation although it does cause longer delays 
when clients issue some commands.

    use of slaves for local access can reduce the load on the master 
resulting in higher performance.

    can be cascaded to multiple slaves and multiple tiers of slaves as 
needed

    in case of problems on the master the slaves can continue to operate as 
read-only servers providing degraded service while the master is fixed. 
depending on the problem with the master this may be very preferable to 
having to re-sync the master or recover from a split-brain situation

   Con:
    more extensive modifications needed to trap all changes and propogate 
them up to the master

    how does the slave know when the master has implemented the change (so 
that it can give the result to the client)

    raises questions about the requirement to get confirmation og all 
updates before the slave can respond to the client (for example, if a 
slave decides to read a message that is flagged as new should the slave 
wait until the master confirms that it knows the message has been read 
before it gives it to the client, or should it give the message to the 
client and not worry if the update fails on the master)

    since the slave needs to send updates to the master the latency of the 
link between them can become a limiting factor in the performance that 
clients see when connecting to the slave

4. #3 with automatic failover

   Since #3 supports multiple slaves the number of failover senerios grow 
significantly. you have multiple machines that could be the new master and 
you have the split-brain senerio to watch out for.

   Pro:
    increased availability by decreasing failover time

    potentially easier to setup then with external clustering software

   Con:
    increased complexity

    runs the risk of breaking some deployment senerios in an attempt to 
simplify others

5. Active/Active

   designate one of the boxes as primary and identify all items in the 
datastore that absolutly must not be subject to race conditions between 
the two boxes (message UUID for example). In addition to implementing the 
replication needed for #1 modify all functions that need to update these 
critical pieces of data to update them on the master and let the master 
update the other box.

   Pro:
    best use of available hardware as the load is split almost evenly 
between the boxes.

    best availability becouse if there is a failure half of the clients 
won't see it at all

   Con:
    significantly more complex then the other options.

    behavior during a failure is less obvious

    split-brain recovery is not straightforward and if automatic failover 
is active the sysadmin will have no option to have things degraded 
slightly while a problem is fixed

    depending on the implementation this may be very sensitive to network 
latency between the machines and could be very suitable for working with 
machines in the same datacenter, but worthless for machines thousands of 
miles apart.

6. active/active/active/...

   Take #5 and extend the idea to more then a pair of boxes. this makes the 
updates more complex to propogate (they now need to be sent to every other 
machine in the cluster)

   Pro:
    better load balancing then #5

    allows for the ability to have a HA pair in a primary location and a 
backup in a remote location (i.e. your main HQ has two boxes, but your 
disaster recovery center has one as well)

   Con:
    the complexity goes up significantly when you shift from 2 to n boxes 
in a cluster.

    the bandwidth required for updates increases by a factor of roughly n!

    significantly more split-brain senerios become possible and need to be 
accounted for.

-------------------------------------------------------------------------

while #6 is the ideal option to have it can get very complex

personally I would like to see #1 (with a sample daemon or two to provide 
basic functionality and leave the doors open for more creative uses) 
followed by #3 while people try and figure out all the problems with #5 
and #6

there are a lot of senerios that are possible with #1 or #3 that are not 
possible with #5 and very little of the work needed to release #1 and #3 
as supported options is not work that needs to be done towards #5/6 anyway 
(the pieces need to be identified in the code and hooks put in place in 
the code at those locations. the details of the hooks will differ slightly

David Lang

  -- 
There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.
  -- C.A.R. Hoare
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html