Cyrus crashed on redundant platform - need better availability?

Paul Dekkers Paul.Dekkers at surfnet.nl
Fri Sep 10 10:27:40 EDT 2004


Hi,

Sebastian Hagedorn wrote:

>> There are two machines for redundancy. If one fails, the other one 
>> should
>> take over: mount the disks from the array, and move on.
>
> Right, works fine for us for the most part. Hasn't always been like 
> that, but the most recent kernel updates by Red Hat have improved 
> matters a lot.

What did the kernel improve? You are not using a clustered filesystem, 
right?

>> Unfortunally, the primary server crashed twice already. The first 
>> time it
>> did while synchronising two IMAP-spools from the old server to the new
>> one. There was not much data on it back then. The second time was worse,
>> around 10Gb of mail was stored on the disks. We discovered that the fsck
>> took about 30 minutes,
>
> Isn't your filesystem journaled? We use ext3 for ours. There *have* 
> been a few occasions where the journal had been damaged as well 
> (forcing us to run fsck), but those have been few and far between. In 
> all other instances the failover is nearly instantaneous.

Well, it's UFS2 with softupdates, so yes. I'm afraid the journal was 
damaged in my case, there were serveral complaints while doing the fsck 
about softupdate inconsistencies. (The server crashed once more but 
since I mounted with -o sync now the fsck was much faster. I'll keep it 
that way for now untill we know what's really wrong - it was again with 
a large mail-folder synchronisation...)

>> Although many on the list claim that this (having 2 boxes with 1
>> disk-array) is a nice way for redundancy I'm in doubt now if this is
>> true.
>
> It's good but not perfect. We recently installed a huge SAN and are 
> now in the process of moving over the mail data to reside there. 
> Fibrechannel seems to be much more error tolerant than SCSI.

Hmm, I don't expect the problems to be SCSI-related. Maybe it has to do 
with GEOM and SMP in FreeBSD 5.2.1, but not the SCSI-bus itself. (There 
are two seperate controllers for both machines, they never see each 
other on the same SCSI bus...)

I still think that it would be best to have two filesystems instead of 
one, so with mirroring on application level (cyrus)... :-)

Paul


---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html




More information about the Info-cyrus mailing list