Distributed File Systems

David Lang dlang at diginsite.com
Sun Oct 20 17:45:33 EDT 2002


note that I said I have heard of people doing this, I haven't done it
myself :-)

that being said I think that it would be fairly straightforward to read
from the filesystem and write to the database. it may require that you
block changes to the mailstore for the duration, but any transition from
one mailstore to another (including going from a local drive to a external
raid aray) has a similar problem.

David Lang

On Sun, 20 Oct 2002, David Chait wrote:

> Date: Sun, 20 Oct 2002 14:46:11 -0700
> From: David Chait <davidc at bonair.stanford.edu>
> To: David Lang <david.lang at digitalinsight.com>
> Cc: Jared Watkins <jwatkins at snowcrash.homeip.net>,
>      info-cyrus at lists.andrew.cmu.edu
> Subject: Re: Distributed File Systems
>
> A very interresting idea, We do have licensing for SQL and Oracle, however
> assuming a production environment, how would you carry over existing mail
> into the database structure?
>
> ----- Original Message -----
> From: "David Lang" <david.lang at digitalinsight.com>
> To: "David Chait" <davidc at bonair.stanford.edu>
> Cc: "Jared Watkins" <jwatkins at snowcrash.homeip.net>;
> <info-cyrus at lists.andrew.cmu.edu>
> Sent: Sunday, October 20, 2002 2:33 PM
> Subject: Re: Distributed File Systems
>
>
> > another option to consider.
> >
> > I have heard of people hacking cyrus to store it's data in a SQL database
> > instead of a raw filesystem. if you do this you can then invoke the full
> > set of SQL replication capabilities (including better transaction
> > support then you can get in a filesystem).
> >
> > David Lang
> >
> >  On Sun, 20 Oct 2002, David Chait wrote:
> >
> > > Date: Sun, 20 Oct 2002 14:12:05 -0700
> > > From: David Chait <davidc at bonair.stanford.edu>
> > > To: Jared Watkins <jwatkins at snowcrash.homeip.net>
> > > Cc: info-cyrus at lists.andrew.cmu.edu
> > > Subject: Re: Distributed File Systems
> > >
> > > I see, NBD actually looks a little more risky than I would prefer,
> though
> > > going with a replicative structure based on Coda or AFS might be safer.
> In
> > > that scenario you are not hacking the kernel to mount drives, but rather
> > > connecting to file servers via client software which handle this
> replication
> > > by design. My main concern with this though is stability, I know for a
> fact
> > > NFS is a nono, per the docs, but nothing has been said of the other,
> more
> > > developed options out there. If the file store part of the equation can
> be
> > > sorted out, the rest (mirroring the front end servers, and load
> balancing)
> > > is trivial with the available tools. Both CODA and AFS were developed at
> > > CMU, and I would be very interrested in hearing their thoughts as well.
> > >
> > > -David
> > >
> > >
> > > ----- Original Message -----
> > > From: "Jared Watkins" <jwatkins at snowcrash.homeip.net>
> > > To: "David Chait" <davidc at bonair.stanford.edu>
> > > Sent: Sunday, October 20, 2002 1:56 PM
> > > Subject: Re: Distributed File Systems
> > >
> > >
> > > > Network Block Device...   the http://linux-ha.org site has some links
> to
> > > > a few different projects that provide that service in slightly
> different
> > > > ways. Making a mail server highly redundant and available is not an
> easy
> > > > task... and it is bound to be a little messy... but it is possible to
> > > > roll your own with the software that is already out there.
> > > >
> > > > Jared
> > > >
> > > >
> > > >
> > > > David Chait wrote:
> > > >
> > > > >Excuse my apparent ignorance, but..NBD is a term I haven't run
> across.
> > > What
> > > > >does it involve?
> > > > >
> > > > >----- Original Message -----
> > > > >From: "Jared Watkins" <jwatkins at snowcrash.homeip.net>
> > > > >To: "David Chait" <davidc at bonair.stanford.edu>
> > > > >Sent: Sunday, October 20, 2002 1:39 PM
> > > > >Subject: Re: Distributed File Systems
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >>You could possibly use a NBD along with software raid 1... this used
> > > > >>over a dedicated set of GB ethernet cards should work... although I
> have
> > > > >>not tried anything like that in production.  I have used IPStor
> software
> > > > >>from FalconStor to create a virtualized SAN.  It is expensive.. but
> very
> > > > >>powerful software.  It is possible to duplicate many of the features
> of
> > > > >>IPStor using software raid.. NBD... and LVM... including the
> snapshot
> > > > >>features.
> > > > >>
> > > > >>Jared
> > > > >>
> > > > >>
> > > > >>David Chait wrote:
> > > > >>
> > > > >>
> > > > >>
> > > > >>>Jeremy,
> > > > >>>   While that would resolve the front end problem, in the end your
> mail
> > > > >>>partition would still be a single point of failure. I'm trying to
> find
> > > a
> > > > >>>
> > > > >>>
> > > > >way
> > > > >
> > > > >
> > > > >>>to do a real time replication of the mail partition between both
> > > machines
> > > > >>>
> > > > >>>
> > > > >to
> > > > >
> > > > >
> > > > >>>allow for a complete failover.
> > > > >>>
> > > > >>>----- Original Message -----
> > > > >>>From: "Jeremy Rumpf" <jrumpf at heavyload.net>
> > > > >>>To: "David Chait" <davidc at bonair.stanford.edu>;
> > > > >>><info-cyrus at lists.andrew.cmu.edu>
> > > > >>>Sent: Sunday, October 20, 2002 12:59 PM
> > > > >>>Subject: Re: Distributed File Systems
> > > > >>>
> > > > >>>
> > > > >>>On Saturday 19 October 2002 02:23 am, David Chait wrote:
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>>>Greetings,
> > > > >>>>   Has anyone here looked into or had experience with Distributed
> File
> > > > >>>>Systems (AFS, NFS, CODA, etc) applied to mail partitions to allow
> for
> > > > >>>>clusetering or fail over capability of Cyrus IMAP machines? I have
> > > seen
> > > > >>>>docs for splitting the accounts between machines, however this
> doesn't
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>seem
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>>>like the best fault tollerant solution.
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>The easiest way to have fault tolerance would be to match up your
> IMAP
> > > > >>>servers
> > > > >>>in an active/active setup where each IMAP server has another server
> > > > >>>
> > > > >>>
> > > > >that's
> > > > >
> > > > >
> > > > >>>willing to take over if a failure occurs.
> > > > >>>
> > > > >>>I currently admin such a setup (iPlanet setup) but a cyrus setup
> would
> > > go
> > > > >>>like
> > > > >>>this (I plan on actually building this setup soon):
> > > > >>>
> > > > >>>The mail stores and server executables live on a disk partition
> that's
> > > > >>>accessable by both machines. This can be accomplished by either
> using
> > > > >>>
> > > > >>>
> > > > >fibre
> > > > >
> > > > >
> > > > >>>channel disk arrays or multi-initiator SCSI disk arrays. So the
> entire
> > > > >>>
> > > > >>>
> > > > >cyrus
> > > > >
> > > > >
> > > > >>>installations would look like:
> > > > >>>
> > > > >>>/usr/local/cyrus-server1/bin
> > > > >>>/usr/local/cyrus-server1/etc
> > > > >>>/usr/local/cyrus-server1/data
> > > > >>>/usr/local/cyrus-server1/data/conf
> > > > >>>/usr/local/cyrus-server1/data/partition1
> > > > >>>/usr/local/cyrus-server1/data/partition2
> > > > >>>/usr/local/cyrus-server1/data/sieve
> > > > >>>/usr/local/cyrus-server1/include
> > > > >>>/usr/local/cyrus-server1/lib
> > > > >>>/usr/local/cyrus-server1/man
> > > > >>>/usr/local/cyrus-server1/share
> > > > >>>
> > > > >>>Where the executables live in bin.
> > > > >>>
> > > > >>>Cyrus.conf and imapd.conf live in etc. A startup script similar to
> what
> > > > >>>would
> > > > >>>go in /etc/init.d also lives in etc.
> > > > >>>
> > > > >>>Deliver.db, mailboxes.db, quota, etc. goes in the data/conf
> directory.
> > > > >>>
> > > > >>>Sieve scripts live in the data/sieve directory.
> > > > >>>
> > > > >>>data/partition1 and data/partition2 are the actual mailbox store
> > > > >>>
> > > > >>>
> > > > >partitions
> > > > >
> > > > >
> > > > >>>as
> > > > >>>defined in imapd.conf.
> > > > >>>
> > > > >>>Now, for performance reasons, this whole directory tree may live on
> > > more
> > > > >>>than
> > > > >>>one [RAID] device, but for the simplicity of example, let's imagine
> > > they
> > > > >>>live
> > > > >>>on one single disk device. Say that server1 lives on /dev/sda1.
> > > > >>>
> > > > >>>Now we also have an identical setup living under:
> > > > >>>
> > > > >>>/usr/local/cyrus-server2
> > > > >>>
> > > > >>>Server 2 lives on /dev/sdb2.
> > > > >>>
> > > > >>>
> > > > >>>These two disk devices are directly connected to two servers, via
> fibre
> > > > >>>channel or multi-initiator SCSI:
> > > > >>>
> > > > >>>+------------+           +------------+
> > > > >>>| server 1   |           | server 2   |
> > > > >>>|            |           |            |
> > > > >>>+------------+           +------------+
> > > > >>>     |                         |
> > > > >>>     |      +-------------+    |
> > > > >>>     |      | Disk subsys |    |
> > > > >>>     -------|             |----+
> > > > >>>            +-------------+
> > > > >>>
> > > > >>>The idea now is that either server can mount either /dev/sda1 or
> > > > >>>
> > > > >>>
> > > > >/dev/sdb1,
> > > > >
> > > > >
> > > > >>>_but_ only one server can have each device mounted at any single
> > > > >>>
> > > > >>>
> > > > >instance.
> > > > >
> > > > >
> > > > >>>So
> > > > >>>under normal operating conditions server 1 has /dev/sda1 mounted on
> > > > >>>/usr/local/cyrus-server1, runs "/usr/local/cyrus-server1/etc/cyrus
> > > start"
> > > > >>>and
> > > > >>>is off to the races. Server 2 does the same thing with /dev/sdb1
> and
> > > > >>>/usr/local/cyrus-server2.
> > > > >>>
> > > > >>>Each server has two IP addresses. The primary address is a static
> > > address
> > > > >>>assigned to the server. The secondary address is a floating IP
> address
> > > > >>>
> > > > >>>
> > > > >that
> > > > >
> > > > >
> > > > >>>cyrus is configured to bind to at startup (in
> > > > >>>/usr/local/cyrus-server1/etc/cyrus.conf).
> > > > >>>
> > > > >>>Each server runs some heartbeat software that keeps track if the
> other
> > > > >>>server
> > > > >>>is alive and well. If server 2 detects that server 1 is dead (or
> vice
> > > > >>>versa),
> > > > >>>the following actions may occur:
> > > > >>>
> > > > >>>1> Add the floating IP address for server 1 to server 2 (as an
> alias
> > > most
> > > > >>>likely)
> > > > >>>
> > > > >>>2> Server 2 mounts /dev/sda1 on /usr/local/cyrus-server1
> > > > >>>
> > > > >>>3> Server 2 runs the startup script
> /usr/local/cyrus-server1/etc/cyrus
> > > > >>>
> > > > >>>
> > > > >start
> > > > >
> > > > >
> > > > >>>Boom, server 2 is now completely acting as if it is server 1.
> Abiet,
> > > > >>>
> > > > >>>
> > > > >server
> > > > >
> > > > >
> > > > >>>2
> > > > >>>now has to assume twice the load. The load of it's users, and the
> load
> > > of
> > > > >>>server 1's users, but this is better than not having server 1
> available
> > > > >>>
> > > > >>>
> > > > >at
> > > > >
> > > > >
> > > > >>>all. This is also ideal for maintenance, where server 1 could be
> taken
> > > > >>>offline at a non peak hour for hardware upgrades.
> > > > >>>
> > > > >>>So in this architecture, the IMAP servers use a sort of buddy
> system
> > > > >>>
> > > > >>>
> > > > >where
> > > > >
> > > > >
> > > > >>>each server is assigned a buddy. The buddies then keep a watch on
> each
> > > > >>>
> > > > >>>
> > > > >other
> > > > >
> > > > >
> > > > >>>willing to assume the work load of the other if they die or become
> > > > >>>unresponsive.
> > > > >>>
> > > > >>>As far as failover/heartbeat software, RedHat's Advanced server
> uses
> > > > >>>Kimberlite from the Mission Critical Linux project to handle
> failovers.
> > > > >>>
> > > > >>>http://oss.missioncriticallinux.com/projects/kimberlite/
> > > > >>>
> > > > >>>One of these days I'll get around to playing with Kimberlite, so
> I'll
> > > be
> > > > >>>able
> > > > >>>to comment on it more.
> > > > >>>
> > > > >>>The above setup relies on the ability to have shared disk devices
> and
> > > not
> > > > >>>
> > > > >>>
> > > > >on
> > > > >
> > > > >
> > > > >>>NFS, CODA, InterMezzo, etc. Locking and synchronization issues with
> > > those
> > > > >>>would be a real pain.
> > > > >>>
> > > > >>>My $.02 anyhow :).
> > > > >>>
> > > > >>>Jeremy
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
>




More information about the Info-cyrus mailing list