Cyrus crashed on redundant platform - need better availability?
Paul.Dekkers at surfnet.nl
Wed Sep 15 07:38:43 EDT 2004
Sebastian Hagedorn wrote:
>> You are not using a clustered filesystem,
I can imagine that would be one of the advantages of RH's clustering,
since you don't have to mount a filesystem in that case for a machine
that just crashed - it would safe time...
But I suppose RH's cluster manager takes care of mounting the partitions
and checking them if there are any errors.
>>> It's good but not perfect. We recently installed a huge SAN and are
>>> now in the process of moving over the mail data to reside there.
>>> Fibrechannel seems to be much more error tolerant than SCSI.
Where you working with a "multi-initiator enviroment" (as RH calls it)
or "single initiator" (e.g. with 2 machines on exactly the same SCSI
bus, or two seperate interfaces on your array's SCSI controller?)
I think with a multi-initiator enviroment (as we have it) there is a
very limited chance of failures.
>> Hmm, I don't expect the problems to be SCSI-related. Maybe it has to
> That's not what I was talking about. We have a similar setup, yet
> still there were instances when Red Hat's cluster software failed to
> write to the shared storage. I guess this was caused by the slow-downs
> connected to the memory management, but Red Hat support indicated that
> shared storage connected via FibreChannel would not have been as
> susceptible to these problems.
Do you think using RH's cluster software is a valuable consideration for
this kind of clustering setup? Using FreeBSD there are not that many
clustering solutions for now, and if it's advisable to at least consider
using RH here (although I have no experience with RH) we can certainly
look at it. (Any idea how fast RH would "recover services"?)
On the other hand, if there is a application level redundancy on its
way, it doesn't really matter on what platform the machine runs, so it
would still make me happier and even with FreeBSD. And I would rather
put my money there. Even if it means we'll have to wait for some months,
we would do that and take the risk of running on a "less
automatic-failover-situation" with a worst-case downtime of 30 mins (or
2 mins regulary with sync-mounted filesystems now).
>> The kernel that shipped with RedHat AS 2.1 was useless for most of the
>> tasks i tried it with. About three revisions later it became somewhat
>> more usefull for non-oracle types of use, but i've rolled my own and am
>> not following the state of it now.
> That's fine if you don't have to rely on commercial support. Our
> management decided to go the supported path all the way. That doesn't
> leave you many options. I have to say that when it works, the cluster
> software works extremely well. It's just that it hasn't always worked
> in the past ... ;-)
That's a plus for RH (ES|AS) 3
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
More information about the Info-cyrus