Cyrus crashed on redundant platform - need better availability?

David Lang david.lang at digitalinsight.com
Fri Sep 10 18:20:07 EDT 2004


On Fri, 10 Sep 2004, Michael Loftis wrote:

> Date: Fri, 10 Sep 2004 13:15:05 -0600
> From: Michael Loftis <mloftis at wgops.com>
> To: Paul Dekkers <Paul.Dekkers at surfnet.nl>, info-cyrus at lists.andrew.cmu.edu
> Subject: Re: Cyrus crashed on redundant platform - need better availability?
> 
> The theory only translates if you're using a JOURNALED file system.  Linux 
> ext3, reiserfs.... AIX JFS, Sun/others veritas are all examples of this. 
> AFAIK FreeBSD hasn't any journalling file systems, i could be wrong though 
> since I haven't really looked for one (my freebsd boxes just run...and 
> run...and run...)  That said, the machine shouldn't' have crashed in the 
> first place, but you are running 5.x which is clearly labeled as *NOT* 
> production (4.10 for that)...  All of my produciton boxen are 4.x based (of 
> the FreeBSD herd)
>

However even a Journaled filesystem won't protect you completely from 
corruption. even the filesystems you list can loose data when there is a 
crash and if one system goes haywire and starts scribbling on the shared 
disk it will trash any filesystem.

David Lang

>
>
> --On Friday, September 10, 2004 13:24 +0200 Paul Dekkers 
> <Paul.Dekkers at surfnet.nl> wrote:
>
>> Hi,
>> 
>> We're implementing a new mailplatform running on two dell 2650-servers (2
>> xeon cpu's with each 3 Ghz, HTT and 3Gb of memory) and with a disk array
>> of 4 Tb connected with a adaptec 39160 scsi controller for storage. We
>> installed FreeBSD 5.2.1 on it, and - of course - cyrus 2.2.8 (from the
>> ports) as IMAP server. Our MTA is postfix.
>> There are two machines for redundancy. If one fails, the other one should
>> take over: mount the disks from the array, and move on.
>> 
>> Unfortunally, the primary server crashed twice already. The first time it
>> did while synchronising two IMAP-spools from the old server to the new
>> one. There was not much data on it back then. The second time was worse,
>> around 10Gb of mail was stored on the disks. We discovered that the fsck
>> took about 30 minutes, so although we have two machines for redundancy it
>> takes still quite some time before the mail is available again. (And we
>> still have about 90 Gb of mail to migrate, so when all users are migrated
>> it takes much longer.)
>> I mounted the filesystems synchronous now: although it slows down the
>> system I hope it speeds up the fsck a bit when there is another crash.
>> The second crash was while removing a lot of mailboxes (dm) while some of
>> them where removed the same time using a webmail app (squirrelmail).
>> 
>> I'm not sure why the box crashed; there was nothing in the logs, there
>> was nothing on the screen when we came there, it just booted up again. Of
>> course I'm interested if anyone has any thoughts on this.
>> 
>> Although many on the list claim that this (having 2 boxes with 1
>> disk-array) is a nice way for redundancy I'm in doubt now if this is
>> true. It still takes 30 mins before everything is back again! It seems to
>> me that if there was a "live" version of cyrus available with a
>> synchronised mail-spool, that there was no outage noticeable for users
>> (except in losing a connection maybe). Am I right?
>> 
>> Maybe it's time to continue on the "High availability ...
>> again"-discussion we had a while ago. If the cyrus developers are able to
>> implement this with some funding there are still some questions left for
>> me: how much time would it take before a "stable" solution is ready? How
>> many funding is expected? I still have to talk to management about this,
>> but I would really support this development and I'm certainly willing to
>> convince some managers.
>> 
>> Regards,
>> Paul
>> 
>> 
>> ---
>> Cyrus Home Page: http://asg.web.cmu.edu/cyrus
>> Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
>> List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
>> 
>
>
>
> --
> Undocumented Features quote of the moment...
> "It's not the one bullet with your name on it that you
> have to worry about; it's the twenty thousand-odd rounds
> labeled `occupant.'"
>  --Murphy's Laws of Combat
>
> ---
> Cyrus Home Page: http://asg.web.cmu.edu/cyrus
> Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
> List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
>

-- 
There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.
  -- C.A.R. Hoare
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html




More information about the Info-cyrus mailing list