Cyrus crashed on redundant platform - need better availability?
Paul Dekkers
Paul.Dekkers at surfnet.nl
Fri Sep 10 10:32:33 EDT 2004
Jure PeÄar wrote:
>>Although many on the list claim that this (having 2 boxes with 1
>>disk-array) is a nice way for redundancy I'm in doubt now if this is
>>true. It still takes 30 mins before everything is back again! It seems
>>to me that if there was a "live" version of cyrus available with a
>>synchronised mail-spool, that there was no outage noticeable for users
>>(except in losing a connection maybe). Am I right?
>>
>>
>Having 2 boxes with one disk array leaves you wit a single point of failure
>that you wouldn't think of immediately: filesystem. I learned that the hard
>way.
>
>
Yes, I agree.
>I'm planning to 'redesign' our storage: instead of one big volume that fscks
>for hours, i'm going to split in in many mirrors and use them as cyrus
>partitions. This way they could all fsck in parrallel. I'm going to lose the
>'single instance store' capability, but thats a tradeoff that i'm willing to
>take.
>
>
Hmm, then your fscks will run faster/with less problems, but there is
still outage that you can prevent if there is failover in another way
and availability/replication on the application level.
If there are replicated spools it doesn't matter if the fsck takes long
or not... although there will be a backlog of course.
Is it possible to have an fsck running on one partition and have cyrus
started already (so part of the mail-store, e.g. archives, is not
available yet?)
>It happened to me at least once that the machine that crashed corrupted the
>filesystem in a way that the machine that took over also crashed within
>hours...
>
>
>>Maybe it's time to continue on the "High availability ...
>>again"-discussion we had a while ago. If the cyrus developers are able
>>to implement this with some funding there are still some questions left
>>for me: how much time would it take before a "stable" solution is ready?
>>How many funding is expected? I still have to talk to management about
>>this, but I would really support this development and I'm certainly
>>willing to convince some managers.
>>
>>
>The only high availability i see here is the google way. Cyrus is offering
>you that with the 'murder' component.
>
>
That's not really availability, but distributed risk.
>BTW, you're mentioning FreeBSD ... doesn't it have some sort of background
>fsck while the filesystem is moutned rw?
>
>
It can, but I'm not sure if that's what I prefer. I'm not sure how
mature it is with FreeBSD, and I prefer to have mail-integrety over a
"quick restore".
Paul
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
More information about the Info-cyrus
mailing list