Distributed File Systems

Sun Oct 20 17:46:11 EDT 2002

A very interresting idea, We do have licensing for SQL and Oracle, however
assuming a production environment, how would you carry over existing mail
into the database structure?

----- Original Message -----
From: "David Lang" <david.lang at digitalinsight.com>
To: "David Chait" <davidc at bonair.stanford.edu>
Cc: "Jared Watkins" <jwatkins at snowcrash.homeip.net>;
<info-cyrus at lists.andrew.cmu.edu>
Sent: Sunday, October 20, 2002 2:33 PM
Subject: Re: Distributed File Systems

> another option to consider.
>
> I have heard of people hacking cyrus to store it's data in a SQL database
> instead of a raw filesystem. if you do this you can then invoke the full
> set of SQL replication capabilities (including better transaction
> support then you can get in a filesystem).
>
> David Lang
>
>  On Sun, 20 Oct 2002, David Chait wrote:
>
> > Date: Sun, 20 Oct 2002 14:12:05 -0700
> > From: David Chait <davidc at bonair.stanford.edu>
> > To: Jared Watkins <jwatkins at snowcrash.homeip.net>
> > Cc: info-cyrus at lists.andrew.cmu.edu
> > Subject: Re: Distributed File Systems
> >
> > I see, NBD actually looks a little more risky than I would prefer,
though
> > going with a replicative structure based on Coda or AFS might be safer.
In
> > that scenario you are not hacking the kernel to mount drives, but rather
> > connecting to file servers via client software which handle this
replication
> > by design. My main concern with this though is stability, I know for a
fact
> > NFS is a nono, per the docs, but nothing has been said of the other,
more
> > developed options out there. If the file store part of the equation can
be
> > sorted out, the rest (mirroring the front end servers, and load
balancing)
> > is trivial with the available tools. Both CODA and AFS were developed at
> > CMU, and I would be very interrested in hearing their thoughts as well.
> >
> > -David
> >
> >
> > ----- Original Message -----
> > From: "Jared Watkins" <jwatkins at snowcrash.homeip.net>
> > To: "David Chait" <davidc at bonair.stanford.edu>
> > Sent: Sunday, October 20, 2002 1:56 PM
> > Subject: Re: Distributed File Systems
> >
> >
> > > Network Block Device...   the http://linux-ha.org site has some links
to
> > > a few different projects that provide that service in slightly
different
> > > ways. Making a mail server highly redundant and available is not an
easy
> > > task... and it is bound to be a little messy... but it is possible to
> > > roll your own with the software that is already out there.
> > >
> > > Jared
> > >
> > >
> > >
> > > David Chait wrote:
> > >
> > > >Excuse my apparent ignorance, but..NBD is a term I haven't run
across.
> > What
> > > >does it involve?
> > > >
> > > >----- Original Message -----
> > > >From: "Jared Watkins" <jwatkins at snowcrash.homeip.net>
> > > >To: "David Chait" <davidc at bonair.stanford.edu>
> > > >Sent: Sunday, October 20, 2002 1:39 PM
> > > >Subject: Re: Distributed File Systems
> > > >
> > > >
> > > >
> > > >
> > > >>You could possibly use a NBD along with software raid 1... this used
> > > >>over a dedicated set of GB ethernet cards should work... although I
have
> > > >>not tried anything like that in production.  I have used IPStor
software
> > > >>from FalconStor to create a virtualized SAN.  It is expensive.. but
very
> > > >>powerful software.  It is possible to duplicate many of the features
of
> > > >>IPStor using software raid.. NBD... and LVM... including the
snapshot
> > > >>features.
> > > >>
> > > >>Jared
> > > >>
> > > >>
> > > >>David Chait wrote:
> > > >>
> > > >>
> > > >>
> > > >>>Jeremy,
> > > >>>   While that would resolve the front end problem, in the end your
mail
> > > >>>partition would still be a single point of failure. I'm trying to
find
> > a
> > > >>>
> > > >>>
> > > >way
> > > >
> > > >
> > > >>>to do a real time replication of the mail partition between both
> > machines
> > > >>>
> > > >>>
> > > >to
> > > >
> > > >
> > > >>>allow for a complete failover.
> > > >>>
> > > >>>----- Original Message -----
> > > >>>From: "Jeremy Rumpf" <jrumpf at heavyload.net>
> > > >>>To: "David Chait" <davidc at bonair.stanford.edu>;
> > > >>><info-cyrus at lists.andrew.cmu.edu>
> > > >>>Sent: Sunday, October 20, 2002 12:59 PM
> > > >>>Subject: Re: Distributed File Systems
> > > >>>
> > > >>>
> > > >>>On Saturday 19 October 2002 02:23 am, David Chait wrote:
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>>>Greetings,
> > > >>>>   Has anyone here looked into or had experience with Distributed
File
> > > >>>>Systems (AFS, NFS, CODA, etc) applied to mail partitions to allow
for
> > > >>>>clusetering or fail over capability of Cyrus IMAP machines? I have
> > seen
> > > >>>>docs for splitting the accounts between machines, however this
doesn't
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>seem
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>>>like the best fault tollerant solution.
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>The easiest way to have fault tolerance would be to match up your
IMAP
> > > >>>servers
> > > >>>in an active/active setup where each IMAP server has another server
> > > >>>
> > > >>>
> > > >that's
> > > >
> > > >
> > > >>>willing to take over if a failure occurs.
> > > >>>
> > > >>>I currently admin such a setup (iPlanet setup) but a cyrus setup
would
> > go
> > > >>>like
> > > >>>this (I plan on actually building this setup soon):
> > > >>>
> > > >>>The mail stores and server executables live on a disk partition
that's
> > > >>>accessable by both machines. This can be accomplished by either
using
> > > >>>
> > > >>>
> > > >fibre
> > > >
> > > >
> > > >>>channel disk arrays or multi-initiator SCSI disk arrays. So the
entire
> > > >>>
> > > >>>
> > > >cyrus
> > > >
> > > >
> > > >>>installations would look like:
> > > >>>
> > > >>>/usr/local/cyrus-server1/bin
> > > >>>/usr/local/cyrus-server1/etc
> > > >>>/usr/local/cyrus-server1/data
> > > >>>/usr/local/cyrus-server1/data/conf
> > > >>>/usr/local/cyrus-server1/data/partition1
> > > >>>/usr/local/cyrus-server1/data/partition2
> > > >>>/usr/local/cyrus-server1/data/sieve
> > > >>>/usr/local/cyrus-server1/include
> > > >>>/usr/local/cyrus-server1/lib
> > > >>>/usr/local/cyrus-server1/man
> > > >>>/usr/local/cyrus-server1/share
> > > >>>
> > > >>>Where the executables live in bin.
> > > >>>
> > > >>>Cyrus.conf and imapd.conf live in etc. A startup script similar to
what
> > > >>>would
> > > >>>go in /etc/init.d also lives in etc.
> > > >>>
> > > >>>Deliver.db, mailboxes.db, quota, etc. goes in the data/conf
directory.
> > > >>>
> > > >>>Sieve scripts live in the data/sieve directory.
> > > >>>
> > > >>>data/partition1 and data/partition2 are the actual mailbox store
> > > >>>
> > > >>>
> > > >partitions
> > > >
> > > >
> > > >>>as
> > > >>>defined in imapd.conf.
> > > >>>
> > > >>>Now, for performance reasons, this whole directory tree may live on
> > more
> > > >>>than
> > > >>>one [RAID] device, but for the simplicity of example, let's imagine
> > they
> > > >>>live
> > > >>>on one single disk device. Say that server1 lives on /dev/sda1.
> > > >>>
> > > >>>Now we also have an identical setup living under:
> > > >>>
> > > >>>/usr/local/cyrus-server2
> > > >>>
> > > >>>Server 2 lives on /dev/sdb2.
> > > >>>
> > > >>>
> > > >>>These two disk devices are directly connected to two servers, via
fibre
> > > >>>channel or multi-initiator SCSI:
> > > >>>
> > > >>>+------------+           +------------+
> > > >>>| server 1   |           | server 2   |
> > > >>>|            |           |            |
> > > >>>+------------+           +------------+
> > > >>>     |                         |
> > > >>>     |      +-------------+    |
> > > >>>     |      | Disk subsys |    |
> > > >>>     -------|             |----+
> > > >>>            +-------------+
> > > >>>
> > > >>>The idea now is that either server can mount either /dev/sda1 or
> > > >>>
> > > >>>
> > > >/dev/sdb1,
> > > >
> > > >
> > > >>>_but_ only one server can have each device mounted at any single
> > > >>>
> > > >>>
> > > >instance.
> > > >
> > > >
> > > >>>So
> > > >>>under normal operating conditions server 1 has /dev/sda1 mounted on
> > > >>>/usr/local/cyrus-server1, runs "/usr/local/cyrus-server1/etc/cyrus
> > start"
> > > >>>and
> > > >>>is off to the races. Server 2 does the same thing with /dev/sdb1
and
> > > >>>/usr/local/cyrus-server2.
> > > >>>
> > > >>>Each server has two IP addresses. The primary address is a static
> > address
> > > >>>assigned to the server. The secondary address is a floating IP
address
> > > >>>
> > > >>>
> > > >that
> > > >
> > > >
> > > >>>cyrus is configured to bind to at startup (in
> > > >>>/usr/local/cyrus-server1/etc/cyrus.conf).
> > > >>>
> > > >>>Each server runs some heartbeat software that keeps track if the
other
> > > >>>server
> > > >>>is alive and well. If server 2 detects that server 1 is dead (or
vice
> > > >>>versa),
> > > >>>the following actions may occur:
> > > >>>
> > > >>>1> Add the floating IP address for server 1 to server 2 (as an
alias
> > most
> > > >>>likely)
> > > >>>
> > > >>>2> Server 2 mounts /dev/sda1 on /usr/local/cyrus-server1
> > > >>>
> > > >>>3> Server 2 runs the startup script
/usr/local/cyrus-server1/etc/cyrus
> > > >>>
> > > >>>
> > > >start
> > > >
> > > >
> > > >>>Boom, server 2 is now completely acting as if it is server 1.
Abiet,
> > > >>>
> > > >>>
> > > >server
> > > >
> > > >
> > > >>>2
> > > >>>now has to assume twice the load. The load of it's users, and the
load
> > of
> > > >>>server 1's users, but this is better than not having server 1
available
> > > >>>
> > > >>>
> > > >at
> > > >
> > > >
> > > >>>all. This is also ideal for maintenance, where server 1 could be
taken
> > > >>>offline at a non peak hour for hardware upgrades.
> > > >>>
> > > >>>So in this architecture, the IMAP servers use a sort of buddy
system
> > > >>>
> > > >>>
> > > >where
> > > >
> > > >
> > > >>>each server is assigned a buddy. The buddies then keep a watch on
each
> > > >>>
> > > >>>
> > > >other
> > > >
> > > >
> > > >>>willing to assume the work load of the other if they die or become
> > > >>>unresponsive.
> > > >>>
> > > >>>As far as failover/heartbeat software, RedHat's Advanced server
uses
> > > >>>Kimberlite from the Mission Critical Linux project to handle
failovers.
> > > >>>
> > > >>>http://oss.missioncriticallinux.com/projects/kimberlite/
> > > >>>
> > > >>>One of these days I'll get around to playing with Kimberlite, so
I'll
> > be
> > > >>>able
> > > >>>to comment on it more.
> > > >>>
> > > >>>The above setup relies on the ability to have shared disk devices
and
> > not
> > > >>>
> > > >>>
> > > >on
> > > >
> > > >
> > > >>>NFS, CODA, InterMezzo, etc. Locking and synchronization issues with
> > those
> > > >>>would be a real pain.
> > > >>>
> > > >>>My $.02 anyhow :).
> > > >>>
> > > >>>Jeremy
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >
> > > >
> > > >
> > > >
> > >
> >
>