Distributed File Systems

Jeremy Rumpf jrumpf at heavyload.net
Sun Oct 20 15:59:37 EDT 2002


On Saturday 19 October 2002 02:23 am, David Chait wrote:
> Greetings,
>     Has anyone here looked into or had experience with Distributed File
> Systems (AFS, NFS, CODA, etc) applied to mail partitions to allow for
> clusetering or fail over capability of Cyrus IMAP machines? I have seen
> docs for splitting the accounts between machines, however this doesn't seem
> like the best fault tollerant solution.

The easiest way to have fault tolerance would be to match up your IMAP servers 
in an active/active setup where each IMAP server has another server that's 
willing to take over if a failure occurs. 

I currently admin such a setup (iPlanet setup) but a cyrus setup would go like 
this (I plan on actually building this setup soon):

The mail stores and server executables live on a disk partition that's 
accessable by both machines. This can be accomplished by either using fibre 
channel disk arrays or multi-initiator SCSI disk arrays. So the entire cyrus 
installations would look like:

/usr/local/cyrus-server1/bin
/usr/local/cyrus-server1/etc
/usr/local/cyrus-server1/data
/usr/local/cyrus-server1/data/conf
/usr/local/cyrus-server1/data/partition1
/usr/local/cyrus-server1/data/partition2
/usr/local/cyrus-server1/data/sieve
/usr/local/cyrus-server1/include
/usr/local/cyrus-server1/lib
/usr/local/cyrus-server1/man
/usr/local/cyrus-server1/share

Where the executables live in bin.

Cyrus.conf and imapd.conf live in etc. A startup script similar to what would 
go in /etc/init.d also lives in etc.

Deliver.db, mailboxes.db, quota, etc. goes in the data/conf directory.

Sieve scripts live in the data/sieve directory.

data/partition1 and data/partition2 are the actual mailbox store partitions as 
defined in imapd.conf.

Now, for performance reasons, this whole directory tree may live on more than 
one [RAID] device, but for the simplicity of example, let's imagine they live 
on one single disk device. Say that server1 lives on /dev/sda1.

Now we also have an identical setup living under:

/usr/local/cyrus-server2

Server 2 lives on /dev/sdb2.


These two disk devices are directly connected to two servers, via fibre 
channel or multi-initiator SCSI:

+------------+           +------------+
| server 1   |           | server 2   |
|            |           |            |
+------------+           +------------+
      |                         |
      |      +-------------+    |
      |      | Disk subsys |    |
      -------|             |----+
             +-------------+

The idea now is that either server can mount either /dev/sda1 or /dev/sdb1, 
_but_ only one server can have each device mounted at any single instance. So 
under normal operating conditions server 1 has /dev/sda1 mounted on 
/usr/local/cyrus-server1, runs "/usr/local/cyrus-server1/etc/cyrus start" and 
is off to the races. Server 2 does the same thing with /dev/sdb1 and 
/usr/local/cyrus-server2.

Each server has two IP addresses. The primary address is a static address 
assigned to the server. The secondary address is a floating IP address that 
cyrus is configured to bind to at startup (in 
/usr/local/cyrus-server1/etc/cyrus.conf).

Each server runs some heartbeat software that keeps track if the other server 
is alive and well. If server 2 detects that server 1 is dead (or vice versa), 
the following actions may occur:

1> Add the floating IP address for server 1 to server 2 (as an alias most 
likely)

2> Server 2 mounts /dev/sda1 on /usr/local/cyrus-server1 

3> Server 2 runs the startup script /usr/local/cyrus-server1/etc/cyrus start

Boom, server 2 is now completely acting as if it is server 1. Abiet, server 2 
now has to assume twice the load. The load of it's users, and the load of 
server 1's users, but this is better than not having server 1 available at 
all. This is also ideal for maintenance, where server 1 could be taken 
offline at a non peak hour for hardware upgrades.

So in this architecture, the IMAP servers use a sort of buddy system where 
each server is assigned a buddy. The buddies then keep a watch on each other 
willing to assume the work load of the other if they die or become 
unresponsive.

As far as failover/heartbeat software, RedHat's Advanced server uses 
Kimberlite from the Mission Critical Linux project to handle failovers.

http://oss.missioncriticallinux.com/projects/kimberlite/

One of these days I'll get around to playing with Kimberlite, so I'll be able 
to comment on it more.

The above setup relies on the ability to have shared disk devices and not on 
NFS, CODA, InterMezzo, etc. Locking and synchronization issues with those 
would be a real pain.

My $.02 anyhow :).

Jeremy





More information about the Info-cyrus mailing list