Distributed File Systems

Sun Oct 20 16:07:26 EDT 2002

Jeremy,
    While that would resolve the front end problem, in the end your mail
partition would still be a single point of failure. I'm trying to find a way
to do a real time replication of the mail partition between both machines to
allow for a complete failover.

----- Original Message -----
From: "Jeremy Rumpf" <jrumpf at heavyload.net>
To: "David Chait" <davidc at bonair.stanford.edu>;
<info-cyrus at lists.andrew.cmu.edu>
Sent: Sunday, October 20, 2002 12:59 PM
Subject: Re: Distributed File Systems

On Saturday 19 October 2002 02:23 am, David Chait wrote:
> Greetings,
>     Has anyone here looked into or had experience with Distributed File
> Systems (AFS, NFS, CODA, etc) applied to mail partitions to allow for
> clusetering or fail over capability of Cyrus IMAP machines? I have seen
> docs for splitting the accounts between machines, however this doesn't
seem
> like the best fault tollerant solution.

The easiest way to have fault tolerance would be to match up your IMAP
servers
in an active/active setup where each IMAP server has another server that's
willing to take over if a failure occurs.

I currently admin such a setup (iPlanet setup) but a cyrus setup would go
like
this (I plan on actually building this setup soon):

The mail stores and server executables live on a disk partition that's
accessable by both machines. This can be accomplished by either using fibre
channel disk arrays or multi-initiator SCSI disk arrays. So the entire cyrus
installations would look like:

/usr/local/cyrus-server1/bin
/usr/local/cyrus-server1/etc
/usr/local/cyrus-server1/data
/usr/local/cyrus-server1/data/conf
/usr/local/cyrus-server1/data/partition1
/usr/local/cyrus-server1/data/partition2
/usr/local/cyrus-server1/data/sieve
/usr/local/cyrus-server1/include
/usr/local/cyrus-server1/lib
/usr/local/cyrus-server1/man
/usr/local/cyrus-server1/share

Where the executables live in bin.

Cyrus.conf and imapd.conf live in etc. A startup script similar to what
would
go in /etc/init.d also lives in etc.

Deliver.db, mailboxes.db, quota, etc. goes in the data/conf directory.

Sieve scripts live in the data/sieve directory.

data/partition1 and data/partition2 are the actual mailbox store partitions
as
defined in imapd.conf.

Now, for performance reasons, this whole directory tree may live on more
than
one [RAID] device, but for the simplicity of example, let's imagine they
live
on one single disk device. Say that server1 lives on /dev/sda1.

Now we also have an identical setup living under:

/usr/local/cyrus-server2

Server 2 lives on /dev/sdb2.

These two disk devices are directly connected to two servers, via fibre
channel or multi-initiator SCSI:

+------------+           +------------+
| server 1   |           | server 2   |
|            |           |            |
+------------+           +------------+
      |                         |
      |      +-------------+    |
      |      | Disk subsys |    |
      -------|             |----+
             +-------------+

The idea now is that either server can mount either /dev/sda1 or /dev/sdb1,
_but_ only one server can have each device mounted at any single instance.
So
under normal operating conditions server 1 has /dev/sda1 mounted on
/usr/local/cyrus-server1, runs "/usr/local/cyrus-server1/etc/cyrus start"
and
is off to the races. Server 2 does the same thing with /dev/sdb1 and
/usr/local/cyrus-server2.

Each server has two IP addresses. The primary address is a static address
assigned to the server. The secondary address is a floating IP address that
cyrus is configured to bind to at startup (in
/usr/local/cyrus-server1/etc/cyrus.conf).

Each server runs some heartbeat software that keeps track if the other
server
is alive and well. If server 2 detects that server 1 is dead (or vice
versa),
the following actions may occur:

1> Add the floating IP address for server 1 to server 2 (as an alias most
likely)

2> Server 2 mounts /dev/sda1 on /usr/local/cyrus-server1

3> Server 2 runs the startup script /usr/local/cyrus-server1/etc/cyrus start

Boom, server 2 is now completely acting as if it is server 1. Abiet, server
2
now has to assume twice the load. The load of it's users, and the load of
server 1's users, but this is better than not having server 1 available at
all. This is also ideal for maintenance, where server 1 could be taken
offline at a non peak hour for hardware upgrades.

So in this architecture, the IMAP servers use a sort of buddy system where
each server is assigned a buddy. The buddies then keep a watch on each other
willing to assume the work load of the other if they die or become
unresponsive.

As far as failover/heartbeat software, RedHat's Advanced server uses
Kimberlite from the Mission Critical Linux project to handle failovers.

http://oss.missioncriticallinux.com/projects/kimberlite/

One of these days I'll get around to playing with Kimberlite, so I'll be
able
to comment on it more.

The above setup relies on the ability to have shared disk devices and not on
NFS, CODA, InterMezzo, etc. Locking and synchronization issues with those
would be a real pain.

My $.02 anyhow :).

Jeremy