Distributed File Systems

David Chait davidc at bonair.stanford.edu
Sun Oct 20 17:12:05 EDT 2002


I see, NBD actually looks a little more risky than I would prefer, though
going with a replicative structure based on Coda or AFS might be safer. In
that scenario you are not hacking the kernel to mount drives, but rather
connecting to file servers via client software which handle this replication
by design. My main concern with this though is stability, I know for a fact
NFS is a nono, per the docs, but nothing has been said of the other, more
developed options out there. If the file store part of the equation can be
sorted out, the rest (mirroring the front end servers, and load balancing)
is trivial with the available tools. Both CODA and AFS were developed at
CMU, and I would be very interrested in hearing their thoughts as well.

-David


----- Original Message -----
From: "Jared Watkins" <jwatkins at snowcrash.homeip.net>
To: "David Chait" <davidc at bonair.stanford.edu>
Sent: Sunday, October 20, 2002 1:56 PM
Subject: Re: Distributed File Systems


> Network Block Device...   the http://linux-ha.org site has some links to
> a few different projects that provide that service in slightly different
> ways. Making a mail server highly redundant and available is not an easy
> task... and it is bound to be a little messy... but it is possible to
> roll your own with the software that is already out there.
>
> Jared
>
>
>
> David Chait wrote:
>
> >Excuse my apparent ignorance, but..NBD is a term I haven't run across.
What
> >does it involve?
> >
> >----- Original Message -----
> >From: "Jared Watkins" <jwatkins at snowcrash.homeip.net>
> >To: "David Chait" <davidc at bonair.stanford.edu>
> >Sent: Sunday, October 20, 2002 1:39 PM
> >Subject: Re: Distributed File Systems
> >
> >
> >
> >
> >>You could possibly use a NBD along with software raid 1... this used
> >>over a dedicated set of GB ethernet cards should work... although I have
> >>not tried anything like that in production.  I have used IPStor software
> >>from FalconStor to create a virtualized SAN.  It is expensive.. but very
> >>powerful software.  It is possible to duplicate many of the features of
> >>IPStor using software raid.. NBD... and LVM... including the snapshot
> >>features.
> >>
> >>Jared
> >>
> >>
> >>David Chait wrote:
> >>
> >>
> >>
> >>>Jeremy,
> >>>   While that would resolve the front end problem, in the end your mail
> >>>partition would still be a single point of failure. I'm trying to find
a
> >>>
> >>>
> >way
> >
> >
> >>>to do a real time replication of the mail partition between both
machines
> >>>
> >>>
> >to
> >
> >
> >>>allow for a complete failover.
> >>>
> >>>----- Original Message -----
> >>>From: "Jeremy Rumpf" <jrumpf at heavyload.net>
> >>>To: "David Chait" <davidc at bonair.stanford.edu>;
> >>><info-cyrus at lists.andrew.cmu.edu>
> >>>Sent: Sunday, October 20, 2002 12:59 PM
> >>>Subject: Re: Distributed File Systems
> >>>
> >>>
> >>>On Saturday 19 October 2002 02:23 am, David Chait wrote:
> >>>
> >>>
> >>>
> >>>
> >>>>Greetings,
> >>>>   Has anyone here looked into or had experience with Distributed File
> >>>>Systems (AFS, NFS, CODA, etc) applied to mail partitions to allow for
> >>>>clusetering or fail over capability of Cyrus IMAP machines? I have
seen
> >>>>docs for splitting the accounts between machines, however this doesn't
> >>>>
> >>>>
> >>>>
> >>>>
> >>>seem
> >>>
> >>>
> >>>
> >>>
> >>>>like the best fault tollerant solution.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>The easiest way to have fault tolerance would be to match up your IMAP
> >>>servers
> >>>in an active/active setup where each IMAP server has another server
> >>>
> >>>
> >that's
> >
> >
> >>>willing to take over if a failure occurs.
> >>>
> >>>I currently admin such a setup (iPlanet setup) but a cyrus setup would
go
> >>>like
> >>>this (I plan on actually building this setup soon):
> >>>
> >>>The mail stores and server executables live on a disk partition that's
> >>>accessable by both machines. This can be accomplished by either using
> >>>
> >>>
> >fibre
> >
> >
> >>>channel disk arrays or multi-initiator SCSI disk arrays. So the entire
> >>>
> >>>
> >cyrus
> >
> >
> >>>installations would look like:
> >>>
> >>>/usr/local/cyrus-server1/bin
> >>>/usr/local/cyrus-server1/etc
> >>>/usr/local/cyrus-server1/data
> >>>/usr/local/cyrus-server1/data/conf
> >>>/usr/local/cyrus-server1/data/partition1
> >>>/usr/local/cyrus-server1/data/partition2
> >>>/usr/local/cyrus-server1/data/sieve
> >>>/usr/local/cyrus-server1/include
> >>>/usr/local/cyrus-server1/lib
> >>>/usr/local/cyrus-server1/man
> >>>/usr/local/cyrus-server1/share
> >>>
> >>>Where the executables live in bin.
> >>>
> >>>Cyrus.conf and imapd.conf live in etc. A startup script similar to what
> >>>would
> >>>go in /etc/init.d also lives in etc.
> >>>
> >>>Deliver.db, mailboxes.db, quota, etc. goes in the data/conf directory.
> >>>
> >>>Sieve scripts live in the data/sieve directory.
> >>>
> >>>data/partition1 and data/partition2 are the actual mailbox store
> >>>
> >>>
> >partitions
> >
> >
> >>>as
> >>>defined in imapd.conf.
> >>>
> >>>Now, for performance reasons, this whole directory tree may live on
more
> >>>than
> >>>one [RAID] device, but for the simplicity of example, let's imagine
they
> >>>live
> >>>on one single disk device. Say that server1 lives on /dev/sda1.
> >>>
> >>>Now we also have an identical setup living under:
> >>>
> >>>/usr/local/cyrus-server2
> >>>
> >>>Server 2 lives on /dev/sdb2.
> >>>
> >>>
> >>>These two disk devices are directly connected to two servers, via fibre
> >>>channel or multi-initiator SCSI:
> >>>
> >>>+------------+           +------------+
> >>>| server 1   |           | server 2   |
> >>>|            |           |            |
> >>>+------------+           +------------+
> >>>     |                         |
> >>>     |      +-------------+    |
> >>>     |      | Disk subsys |    |
> >>>     -------|             |----+
> >>>            +-------------+
> >>>
> >>>The idea now is that either server can mount either /dev/sda1 or
> >>>
> >>>
> >/dev/sdb1,
> >
> >
> >>>_but_ only one server can have each device mounted at any single
> >>>
> >>>
> >instance.
> >
> >
> >>>So
> >>>under normal operating conditions server 1 has /dev/sda1 mounted on
> >>>/usr/local/cyrus-server1, runs "/usr/local/cyrus-server1/etc/cyrus
start"
> >>>and
> >>>is off to the races. Server 2 does the same thing with /dev/sdb1 and
> >>>/usr/local/cyrus-server2.
> >>>
> >>>Each server has two IP addresses. The primary address is a static
address
> >>>assigned to the server. The secondary address is a floating IP address
> >>>
> >>>
> >that
> >
> >
> >>>cyrus is configured to bind to at startup (in
> >>>/usr/local/cyrus-server1/etc/cyrus.conf).
> >>>
> >>>Each server runs some heartbeat software that keeps track if the other
> >>>server
> >>>is alive and well. If server 2 detects that server 1 is dead (or vice
> >>>versa),
> >>>the following actions may occur:
> >>>
> >>>1> Add the floating IP address for server 1 to server 2 (as an alias
most
> >>>likely)
> >>>
> >>>2> Server 2 mounts /dev/sda1 on /usr/local/cyrus-server1
> >>>
> >>>3> Server 2 runs the startup script /usr/local/cyrus-server1/etc/cyrus
> >>>
> >>>
> >start
> >
> >
> >>>Boom, server 2 is now completely acting as if it is server 1. Abiet,
> >>>
> >>>
> >server
> >
> >
> >>>2
> >>>now has to assume twice the load. The load of it's users, and the load
of
> >>>server 1's users, but this is better than not having server 1 available
> >>>
> >>>
> >at
> >
> >
> >>>all. This is also ideal for maintenance, where server 1 could be taken
> >>>offline at a non peak hour for hardware upgrades.
> >>>
> >>>So in this architecture, the IMAP servers use a sort of buddy system
> >>>
> >>>
> >where
> >
> >
> >>>each server is assigned a buddy. The buddies then keep a watch on each
> >>>
> >>>
> >other
> >
> >
> >>>willing to assume the work load of the other if they die or become
> >>>unresponsive.
> >>>
> >>>As far as failover/heartbeat software, RedHat's Advanced server uses
> >>>Kimberlite from the Mission Critical Linux project to handle failovers.
> >>>
> >>>http://oss.missioncriticallinux.com/projects/kimberlite/
> >>>
> >>>One of these days I'll get around to playing with Kimberlite, so I'll
be
> >>>able
> >>>to comment on it more.
> >>>
> >>>The above setup relies on the ability to have shared disk devices and
not
> >>>
> >>>
> >on
> >
> >
> >>>NFS, CODA, InterMezzo, etc. Locking and synchronization issues with
those
> >>>would be a real pain.
> >>>
> >>>My $.02 anyhow :).
> >>>
> >>>Jeremy
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >
> >
> >
> >
>





More information about the Info-cyrus mailing list