Funding Cyrus High Availability

Mon Sep 20 04:51:28 EDT 2004

David Carter wrote:

>> 5. Active/Active
>>
>> designate one of the boxes as primary and identify all items in the 
>> datastore that absolutly must not be subject to race conditions 
>> between the two boxes (message UUID for example). In addition to 
>> implementing the replication needed for #1 modify all functions that 
>> need to update these critical pieces of data to update them on the 
>> master and let the master update the other box.
>
> We may be talking at cross purposes (and its entirely likely that I've
> got the wrong end of the stick!), but I consider active-active to be
> the case where there is no primary: users can make changes to either
> system, and if the two systems lose touch with each other they have
> to resolve their differences when contact is reestablished.

I'd go for #5 as well:
Since this is a setup where there is no primary at all, I suppose this 
is quite some different design then the #1-4 solutions. And because of 
that, I would think that it's rather useless to have these steps done in 
order to get #5 right, but I might as well be wrong.

I would be most happy when the work would start on #5. Personally I 
don't care that much at this moment for #6, but I can imagine that this 
is different for others. But well; if the design is that every machine 
tracks changes and they have them propagated (actively or passively) to 
n hosts (it's not so hard to keep track of that, "all hosts had this 
change; remove it") there is no risk of missing things or not recovering 
I guess. (It's only possible that a slave is out of sync for a very 
short time, and well - why would that be so wrong? And if that is so 
wrong, then maybe fix that later since this would make the work easier?)

This could be the task of the cyrus daemon, but it can as well be the 
work of murder as Jure suggests. (Or both?) I'm not entirely sure that 
that is what we want, but it could be done if that fits nicely (and it 
can be asured that there is always a murder to talk to).

If there is a problem with UID selection, I don't see a problem in that 
one of the servers is responsible for that task. We don't even need an 
election system for that, you could define a sequence for the servers; 
if a server with the highest preference is down, then take over its job. 
It's just that for the users the machines should appear all active. (And 
that in case of failover the remaining machines remain active, and not 
readonly or only active after manual intervention.)

Paul

---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html