Distributed File Systems

Sun Oct 20 17:33:50 EDT 2002

another option to consider.

I have heard of people hacking cyrus to store it's data in a SQL database
instead of a raw filesystem. if you do this you can then invoke the full
set of SQL replication capabilities (including better transaction
support then you can get in a filesystem).

David Lang

 On Sun, 20 Oct 2002, David Chait wrote:

> Date: Sun, 20 Oct 2002 14:12:05 -0700
> From: David Chait <davidc at bonair.stanford.edu>
> To: Jared Watkins <jwatkins at snowcrash.homeip.net>
> Cc: info-cyrus at lists.andrew.cmu.edu
> Subject: Re: Distributed File Systems
>
> I see, NBD actually looks a little more risky than I would prefer, though
> going with a replicative structure based on Coda or AFS might be safer. In
> that scenario you are not hacking the kernel to mount drives, but rather
> connecting to file servers via client software which handle this replication
> by design. My main concern with this though is stability, I know for a fact
> NFS is a nono, per the docs, but nothing has been said of the other, more
> developed options out there. If the file store part of the equation can be
> sorted out, the rest (mirroring the front end servers, and load balancing)
> is trivial with the available tools. Both CODA and AFS were developed at
> CMU, and I would be very interrested in hearing their thoughts as well.
>
> -David
>
>
> ----- Original Message -----
> From: "Jared Watkins" <jwatkins at snowcrash.homeip.net>
> To: "David Chait" <davidc at bonair.stanford.edu>
> Sent: Sunday, October 20, 2002 1:56 PM
> Subject: Re: Distributed File Systems
>
>
> > Network Block Device...   the http://linux-ha.org site has some links to
> > a few different projects that provide that service in slightly different
> > ways. Making a mail server highly redundant and available is not an easy
> > task... and it is bound to be a little messy... but it is possible to
> > roll your own with the software that is already out there.
> >
> > Jared
> >
> >
> >
> > David Chait wrote:
> >
> > >Excuse my apparent ignorance, but..NBD is a term I haven't run across.
> What
> > >does it involve?
> > >
> > >----- Original Message -----
> > >From: "Jared Watkins" <jwatkins at snowcrash.homeip.net>
> > >To: "David Chait" <davidc at bonair.stanford.edu>
> > >Sent: Sunday, October 20, 2002 1:39 PM
> > >Subject: Re: Distributed File Systems
> > >
> > >
> > >
> > >
> > >>You could possibly use a NBD along with software raid 1... this used
> > >>over a dedicated set of GB ethernet cards should work... although I have
> > >>not tried anything like that in production.  I have used IPStor software
> > >>from FalconStor to create a virtualized SAN.  It is expensive.. but very
> > >>powerful software.  It is possible to duplicate many of the features of
> > >>IPStor using software raid.. NBD... and LVM... including the snapshot
> > >>features.
> > >>
> > >>Jared
> > >>
> > >>
> > >>David Chait wrote:
> > >>
> > >>
> > >>
> > >>>Jeremy,
> > >>>   While that would resolve the front end problem, in the end your mail
> > >>>partition would still be a single point of failure. I'm trying to find
> a
> > >>>
> > >>>
> > >way
> > >
> > >
> > >>>to do a real time replication of the mail partition between both
> machines
> > >>>
> > >>>
> > >to
> > >
> > >
> > >>>allow for a complete failover.
> > >>>
> > >>>----- Original Message -----
> > >>>From: "Jeremy Rumpf" <jrumpf at heavyload.net>
> > >>>To: "David Chait" <davidc at bonair.stanford.edu>;
> > >>><info-cyrus at lists.andrew.cmu.edu>
> > >>>Sent: Sunday, October 20, 2002 12:59 PM
> > >>>Subject: Re: Distributed File Systems
> > >>>
> > >>>
> > >>>On Saturday 19 October 2002 02:23 am, David Chait wrote:
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>>Greetings,
> > >>>>   Has anyone here looked into or had experience with Distributed File
> > >>>>Systems (AFS, NFS, CODA, etc) applied to mail partitions to allow for
> > >>>>clusetering or fail over capability of Cyrus IMAP machines? I have
> seen
> > >>>>docs for splitting the accounts between machines, however this doesn't
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>seem
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>>like the best fault tollerant solution.
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>The easiest way to have fault tolerance would be to match up your IMAP
> > >>>servers
> > >>>in an active/active setup where each IMAP server has another server
> > >>>
> > >>>
> > >that's
> > >
> > >
> > >>>willing to take over if a failure occurs.
> > >>>
> > >>>I currently admin such a setup (iPlanet setup) but a cyrus setup would
> go
> > >>>like
> > >>>this (I plan on actually building this setup soon):
> > >>>
> > >>>The mail stores and server executables live on a disk partition that's
> > >>>accessable by both machines. This can be accomplished by either using
> > >>>
> > >>>
> > >fibre
> > >
> > >
> > >>>channel disk arrays or multi-initiator SCSI disk arrays. So the entire
> > >>>
> > >>>
> > >cyrus
> > >
> > >
> > >>>installations would look like:
> > >>>
> > >>>/usr/local/cyrus-server1/bin
> > >>>/usr/local/cyrus-server1/etc
> > >>>/usr/local/cyrus-server1/data
> > >>>/usr/local/cyrus-server1/data/conf
> > >>>/usr/local/cyrus-server1/data/partition1
> > >>>/usr/local/cyrus-server1/data/partition2
> > >>>/usr/local/cyrus-server1/data/sieve
> > >>>/usr/local/cyrus-server1/include
> > >>>/usr/local/cyrus-server1/lib
> > >>>/usr/local/cyrus-server1/man
> > >>>/usr/local/cyrus-server1/share
> > >>>
> > >>>Where the executables live in bin.
> > >>>
> > >>>Cyrus.conf and imapd.conf live in etc. A startup script similar to what
> > >>>would
> > >>>go in /etc/init.d also lives in etc.
> > >>>
> > >>>Deliver.db, mailboxes.db, quota, etc. goes in the data/conf directory.
> > >>>
> > >>>Sieve scripts live in the data/sieve directory.
> > >>>
> > >>>data/partition1 and data/partition2 are the actual mailbox store
> > >>>
> > >>>
> > >partitions
> > >
> > >
> > >>>as
> > >>>defined in imapd.conf.
> > >>>
> > >>>Now, for performance reasons, this whole directory tree may live on
> more
> > >>>than
> > >>>one [RAID] device, but for the simplicity of example, let's imagine
> they
> > >>>live
> > >>>on one single disk device. Say that server1 lives on /dev/sda1.
> > >>>
> > >>>Now we also have an identical setup living under:
> > >>>
> > >>>/usr/local/cyrus-server2
> > >>>
> > >>>Server 2 lives on /dev/sdb2.
> > >>>
> > >>>
> > >>>These two disk devices are directly connected to two servers, via fibre
> > >>>channel or multi-initiator SCSI:
> > >>>
> > >>>+------------+           +------------+
> > >>>| server 1   |           | server 2   |
> > >>>|            |           |            |
> > >>>+------------+           +------------+
> > >>>     |                         |
> > >>>     |      +-------------+    |
> > >>>     |      | Disk subsys |    |
> > >>>     -------|             |----+
> > >>>            +-------------+
> > >>>
> > >>>The idea now is that either server can mount either /dev/sda1 or
> > >>>
> > >>>
> > >/dev/sdb1,
> > >
> > >
> > >>>_but_ only one server can have each device mounted at any single
> > >>>
> > >>>
> > >instance.
> > >
> > >
> > >>>So
> > >>>under normal operating conditions server 1 has /dev/sda1 mounted on
> > >>>/usr/local/cyrus-server1, runs "/usr/local/cyrus-server1/etc/cyrus
> start"
> > >>>and
> > >>>is off to the races. Server 2 does the same thing with /dev/sdb1 and
> > >>>/usr/local/cyrus-server2.
> > >>>
> > >>>Each server has two IP addresses. The primary address is a static
> address
> > >>>assigned to the server. The secondary address is a floating IP address
> > >>>
> > >>>
> > >that
> > >
> > >
> > >>>cyrus is configured to bind to at startup (in
> > >>>/usr/local/cyrus-server1/etc/cyrus.conf).
> > >>>
> > >>>Each server runs some heartbeat software that keeps track if the other
> > >>>server
> > >>>is alive and well. If server 2 detects that server 1 is dead (or vice
> > >>>versa),
> > >>>the following actions may occur:
> > >>>
> > >>>1> Add the floating IP address for server 1 to server 2 (as an alias
> most
> > >>>likely)
> > >>>
> > >>>2> Server 2 mounts /dev/sda1 on /usr/local/cyrus-server1
> > >>>
> > >>>3> Server 2 runs the startup script /usr/local/cyrus-server1/etc/cyrus
> > >>>
> > >>>
> > >start
> > >
> > >
> > >>>Boom, server 2 is now completely acting as if it is server 1. Abiet,
> > >>>
> > >>>
> > >server
> > >
> > >
> > >>>2
> > >>>now has to assume twice the load. The load of it's users, and the load
> of
> > >>>server 1's users, but this is better than not having server 1 available
> > >>>
> > >>>
> > >at
> > >
> > >
> > >>>all. This is also ideal for maintenance, where server 1 could be taken
> > >>>offline at a non peak hour for hardware upgrades.
> > >>>
> > >>>So in this architecture, the IMAP servers use a sort of buddy system
> > >>>
> > >>>
> > >where
> > >
> > >
> > >>>each server is assigned a buddy. The buddies then keep a watch on each
> > >>>
> > >>>
> > >other
> > >
> > >
> > >>>willing to assume the work load of the other if they die or become
> > >>>unresponsive.
> > >>>
> > >>>As far as failover/heartbeat software, RedHat's Advanced server uses
> > >>>Kimberlite from the Mission Critical Linux project to handle failovers.
> > >>>
> > >>>http://oss.missioncriticallinux.com/projects/kimberlite/
> > >>>
> > >>>One of these days I'll get around to playing with Kimberlite, so I'll
> be
> > >>>able
> > >>>to comment on it more.
> > >>>
> > >>>The above setup relies on the ability to have shared disk devices and
> not
> > >>>
> > >>>
> > >on
> > >
> > >
> > >>>NFS, CODA, InterMezzo, etc. Locking and synchronization issues with
> those
> > >>>would be a real pain.
> > >>>
> > >>>My $.02 anyhow :).
> > >>>
> > >>>Jeremy
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >
> > >
> > >
> > >
> >
>