Funding Cyrus High Availability
Paul Dekkers
Paul.Dekkers at surfnet.nl
Mon Sep 20 04:51:28 EDT 2004
David Carter wrote:
>> 5. Active/Active
>>
>> designate one of the boxes as primary and identify all items in the
>> datastore that absolutly must not be subject to race conditions
>> between the two boxes (message UUID for example). In addition to
>> implementing the replication needed for #1 modify all functions that
>> need to update these critical pieces of data to update them on the
>> master and let the master update the other box.
>
> We may be talking at cross purposes (and its entirely likely that I've
> got the wrong end of the stick!), but I consider active-active to be
> the case where there is no primary: users can make changes to either
> system, and if the two systems lose touch with each other they have
> to resolve their differences when contact is reestablished.
I'd go for #5 as well:
Since this is a setup where there is no primary at all, I suppose this
is quite some different design then the #1-4 solutions. And because of
that, I would think that it's rather useless to have these steps done in
order to get #5 right, but I might as well be wrong.
I would be most happy when the work would start on #5. Personally I
don't care that much at this moment for #6, but I can imagine that this
is different for others. But well; if the design is that every machine
tracks changes and they have them propagated (actively or passively) to
n hosts (it's not so hard to keep track of that, "all hosts had this
change; remove it") there is no risk of missing things or not recovering
I guess. (It's only possible that a slave is out of sync for a very
short time, and well - why would that be so wrong? And if that is so
wrong, then maybe fix that later since this would make the work easier?)
This could be the task of the cyrus daemon, but it can as well be the
work of murder as Jure suggests. (Or both?) I'm not entirely sure that
that is what we want, but it could be done if that fits nicely (and it
can be asured that there is always a murder to talk to).
If there is a problem with UID selection, I don't see a problem in that
one of the servers is responsible for that task. We don't even need an
election system for that, you could define a sequence for the servers;
if a server with the highest preference is down, then take over its job.
It's just that for the users the machines should appear all active. (And
that in case of failover the remaining machines remain active, and not
readonly or only active after manual intervention.)
Paul
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
More information about the Info-cyrus
mailing list