Funding Cyrus High Availability
Paul Dekkers
Paul.Dekkers at surfnet.nl
Fri Sep 17 04:05:45 EDT 2004
David Lang wrote:
>>>> Question: Are people looking at this as both redundancy and
>>>> performance, or just redundance?
>>>
>>> Cyrus performs pretty well already. Background redundancy would be
>>> awesome. Especially if we had control over when the syncing process
>>> occurred either via time interval or date/time.
>>
>> I would say not at an interval but as soon as there is an action
>> performed on one mailbox, the other one would be pushed to do
>> something. I believe that is called rolling replication.
>>
>> I would not be really happy with a interval synchronisation. It would
>> make it harder to use both platforms at the same time, and that is
>> what I want as well. So there is a little-bit of load-balancing
>> involved, but more and more _availability_.
>>
>> Being able to use both platforms at the same time maybe implies that
>> there is either no master/slave role or that this is auto-elected
>> between the two and that this role is floating...
>
> right, but there are already tools freely available on most platforms
> to do the election and changing of the role (by switching between
> config files and restarting the master) what is currently lacking is
> any ability to do the master/slave role. once we have that it's just a
> little scripting to tie just about any failover software in to make it
> automatic.
There are indeed tools available for that, but they're not always
working as they're supposed to do and are often very OS limited. With
FreeBSD I had no luch with heartbeat (wouldn't compile under FreeBSD-5),
(U)CARP was not available and FreeVRRP was buggy (at least in my case,
sometimes I had two masters).
Also I wouldn't like it when restarting the cyrus-process with a
different config-file is necessary (or there must be a seperate process
for synchronising that needs restarting, that would make it better).
That would still kill connections to that cyrus-process, I'd rather see
a software switch between that role.
Isn't it possible to have equal roles? If all changes are put in some
backlog, and a synchroniser process runs on both machines and pushes the
backlog (as soon as there is any) to another machine... then you can
have the some process on both (equal) servers... Of course there needs
to be some more intelligence, but that's basicly what I would expect.
> one thing we need to watch out for here is that we don't set an
> impossible/unreasonable goal.
I agree that we'll have to define properly what we expect and what is
reasonable, but I think that at this moment Ken (as developer) has the
best overview in this. We offer our wishlist, and I suppose he
translates that to code in his head ;-)
I suppose that's why he came up with the question about performance
versus redundancy/availability.
> don't try to solve every problem and add every availablity feater you
> can imagine all at once. instead let's look at the building blocks
> that are needed and identify what's currently not available.
I don't agree there completely: I don't want to depend on yet another
tool that defines what the master or slave is. Sometimes they don't work
at all, work only at the same LAN, ... I'm not sure if you can count on
that.
(Hmm, you're the first that mentions the clustering software for
defining roles, and I didn't read about this on your website either.
This is new to me.)
> currently we have murder which will spread the load across multiple
> machines.
Yes, that's indeed something we don't need looking at :-)
(Although there is a posibility now to spread load as well of course,
with two machines available at the same time...)
> currently we have many tools available to detect a server failure and
> run local scripts to reconfigure machines (HACMP on AIX, hearbeat for
> Linux, *BSD, Solaris, etc)
>
> what we currently do not have is any ability to have one mailstore
> updated to match changes in another one.
I would combine these two, and I think that can be done by just
well-designing the last thing you mention.
> I also would not be really satisfied with interval synchronisation as
> the only choice.
In my sketch above (really not sure if it works of course) where both
have something like a backlog you can like "tail" that backlog and push
the update as soon as possible to the second machine. You solve the
thing you mention with delays while pushing updates to two servers at
the same time.
> I think we need something where the primary mailstore pushes a record
> of it's changes to the secondary mailstore
Why not also vise versa?!
We want the two servers to be accessible at the same time, right?
>> If one server is down it should mean that all tasks can be performed at the
>> other one. I 'm curious how this would look if both servers are still running
>> but cannot reach eachother. If there is indeeed a UUID: what if there are
>> doubles... but I guess that has been taken into account.
>
>In cluster terminology this situation is known as being 'split-brained'
>and is generally viewed as a 'VERY BAD THING' that each cluster software
>solves in a slightly different way, from having an odd number of machines
>in the cluster (so that only one half of the cluster can actually have
>enough machines to function) to physicly disconnecting power from a
>machine deemed to have failed (if both boxes attempt to powe each other
>down one will generally win and avoid being shut off itself, but even if
>they do manage to power each other down at least you avaoided the
>split-brain situation)
>
>leave this up to the cluster software. don't try to put this in cyrus
>initially.
>
>
I still don't see why we need clustering software here?! I only see
application replication, no clustering software at all - am I wrong?
If we indeed need a mechanism for UUID's for the messages, maybe one can
define that on one server the messages are odd and on the other even, or
that there is a different range on one server then for the other. (Not
sure if this is really necessary, but in fact I really don't want to
depend on clustering software.) I don't know, I supposed you already
handled that with your patches?
Paul
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
More information about the Info-cyrus
mailing list