Distributed File Systems

Sun Oct 20 04:55:48 EDT 2002

to try to redirect clients you have two options (for geographicly
seperated sites)

1. play DNS games (as you describe below)

2. move the IP addresses (BGP4 routing)

DNS changes only help if the client looks up the IP address again and they
don't have a DNS server that caches the info longer then you want them to.

moving the IP addresses requires more infrastructure support and is a
different skillset to learn, but has no dependancy on the client behaving
properly (other then trying to reconnect after a connection is broken)

David Lang

On Sat, 19 Oct 2002, Michael Fair wrote:

> Date: Sat, 19 Oct 2002 22:51:46 -0700
> From: Michael Fair <michael at daclubhouse.net>
> To: Sebastian Hagedorn <Hagedorn at uni-koeln.de>,
>      David Chait <davidc at bonair.stanford.edu>
> Cc: info-cyrus at lists.andrew.cmu.edu
> Subject: Re: Distributed File Systems
>
> Not to ruffle any feathers but this approach isn't
> any different philosophically from a DFS.  The only
> difference here is that the DFS is created by both
> servers accessing the same physical RAID drive rather
> than the by constantly sharing the FS data.
>
> The extremely significant downside to this approach
> is that the systems MUST be physically near each
> other (AKA only as far as the SCSI connection allows)
> and you don't really get that much redundancy.
>
> In my experience (all mileage varies) I end up dealing
> with network outages far more often than I do server
> failure.  So I consider a network failure my number
> one priority.  Behind that are the disk drives, followed
> by the power supplies, then lastly motherboard and
> compenents.   This approach shares an external RAID
> array across two machines.  So in other words, the
> only thing you are protecting yourself against is
> the failure of the server compenents inside the box
> (power supply, CPU, moterboard, SCSI controller, and
> network card) and the disk drives.  These are only the
> second to least likely to things to fail.
>
> Further, you've introduced yet another server and power
> supply failure risk with the RAID array itself.  While
> I am in favor of external RAID arrays for large drive
> capacity scaling and high I/O throughput, I'm also
> aware I'm also aware that I am usually introducing a
> single point of failure.  You are lost if the array
> itself goes or there is a failure of any of the other
> components connecting this thing to your network.
>
>
>
> The power supply in the server itself (the third most
> likely component to fail) can be protected by simply
> getting a dual redundant hot swappable power supply on
> the server.  I've bought several 2u rackmounts with
> this feature and if you're really desparate Dell makes
> a 1u with dual redundant power (Dell sells the only ones
> I've ever seen).  Of course dual redundant power is most
> effective when you can put each supply on a separate
> circuit and doesn't do as much good if they are both on
> the same circuit because you're only protected against
> supply failure, not power loss, but that's a separate
> issue.  There are also ways to solve the source of power
> issue that I feel are beyond the scope of this email,
> so I'll just stick to failures of the supply itself for
> the rest of this document.
>
>
>
> The network interface is pretty standard to get redundancy
> on the motherboard so let's assume that's redundant and in
> the same box we bought with the dual power and it's been
> set up so that if one network interface fails the other
> takes over.  A better approach is to point the two interfaces
> out different physical network topologies, but that creates
> other routing and fail over problems, so again, I'll just
> stick to physical interface failure for this email.
>
>
>
> Instead of the same external SCSI RAID array this
> other solution proposed, I'm going to use an internal
> hot swappable RAID 1 interface.
>
>
> So let's compare:
>
> - 2 servers w/ dual power, dual CPU, and dual network +
>   external RAID 5 also with dual power.
> appx cost: $10,000
>
> Redundancy level:
> Protected against all failures except RAID array internal
> components failure.
>
>
> - 1 server w/ dual power, dual CPU, and dual network +
>   internal hot swappable SCSCI RAID 1
> appx cost: $3,000 (I've built these with IDE for $2,500)
>
> Redundancy level:
> Protected against all failures except server internal
> components failure.
>
>
> Essentially you end up with the same amount of risk
> for about 30% of the cost.  If anyone sees something
> different and would like to correct me, please do.
> Uptime is something I think we all take seriously and
> I would appreciate being corrected.  There are hot
> swap PCI technologies and card slot servers that can
> increase the fault tolerance even higher, but I have
> not had the pleasure of working on or pricing them.
>
>
>
> Now I don't know about you, but all the data I've gathered
> says that you are more likely to purchase a faulty CPU,
> motehrboard, or SCSI controller than you are to have one
> fail on you.  For most organizations, that's an acceptable
> risk.  For those that can't accept that risk, if you
> purchase two of the servers I just described and add in
> a third and fourth network interface to act as a back
> channel between the two machines for real time drive updates
> and to use for notification of primary server failure
> then for $6,000 you've exceeded the redundancy of the
> original RAID 5 setup.  I leave setting up the pair for
> drive updates and fail over as an excercise for the reader.
>
>
>
> Now let's look at how much fault tolerance we've really
> gotten.  Best case scenario is we've spent the $6,000
> for the dual server setup.  That leaves us protected
> against everything except for the number 1 most likely
> component to fail which is the network.  Now we can
> go spend the rest of that original $10,000 on creating
> a fully redundant network but I believe a better solution
> is geographic dispersion and hot fail over.
>
>
> The problem with geographic dispersion is making sure
> you're hot fail over A) has as much recent data as
> possible, B) can detect it needs to fail over and
> C) can be taken advantage of by your end users.
>
> A and B) is exactly where distributed file systems
> become useful.
>
> But unfortunately, using a DFS (assuming a suitable
> one can be found) is only 2/3 the problem.  The other
> 1/3 is making sure that clients know how to get to the
> new server once a failure occurs.
>
> One less than optimal solution is to put your DNS TTL
> at some insanely low number like 5 minutes (or 1 for
> the ultra paranoid).  Then when the primary server fails,
> you update the DNS and within a few minutes everyone is
> running again.  This however doesn't allow you to take
> any advantage of all that redundant hardware you've
> invested in.  It simply waits until it's called upon.
>
> Ideally what you want is something where you can read
> and write to any server and be gauranteed that all others
> will reflect any changes made.  A DFS can help with this
> problem since the Cyrus server itself is written in such
> a way as to expect muliple processes trying to access the
> same set of files (the fact the other processes are on a
> different server is hidden by the DFS).  But it can't
> solve the end user fail over problem.
>
> So at best what we get is that any end user could be
> tied to any server in the set of available servers.
> But there is no way for a server to automatically
> switch to a secondary server without user intervention.
> (The problem is the same whether it is a frontend
> server or a backend server.  While theoretically a
> frontend server could be made smart enough to search
> a set of backend servers, you still have the same
> problem if a frontend server becomes inaccessible.
>
>
> I'm not certain how to solve the end user client dilemma.
> Ideally I'd like to just give the client N IP addresses
> in response to its DNS query and expect it to choose one
> it can actually get to and fail only if it couldn't
> contact any of them.  One way to simulate this would be
> to make something like perdition smart enough to
> understand about a set of possible sources and then put
> the perdition server as close as possible to the end
> users to maximize the amount of network tolerance it can
> handle.  Another way (though not as easy) would be to put
> software on the on the client itself and have it be the
> smarts.  But that to me just seems like an unmanageable
> solution and certainly won't work for ISPs that are
> frowned upon for forcing their clients to install software
> on their computers.
>
>
> Assuming the client side can be solved, I see hope in the
> Cyrus Murder project being extended to allow backend
> servers to be mirrors of each other.  Either that or
> testing and integration with a setup like CODA which has
> addressed many of these issues in detail.  Unfortunately,
> CODA ultimately relies on user interaction to resolve
> double write conflicts, but it does support that all
> important disconnected operation.   Unfortunately again,
> I don't think that IMAP has anything in its protocol to
> ask an end user a question to help it resolve a conflict
> like that.
>
>
>
> So to sum up, I think the single biggest hurdle in this
> fault tolerance game is the end user's ability to be
> redirected when the server they are used to talking to
> fails.  Behind that is ensuring that multiple servers
> have an integrous copy of the mail store.  The behind
> that is fault tolerance of the server itself in the
> order of disk drive, power, then motherboard component.
>
>
>
> While I'm sure others may put things in a slightly
> different order than I have, I'm pretty sure I hit
> all the necessary points.  Please if I've missed one
> speak up.
>
>
> -- Michael --
>
> ----- Original Message -----
> From: "Sebastian Hagedorn" <Hagedorn at uni-koeln.de>
> To: "David Chait" <davidc at bonair.stanford.edu>
> Cc: <info-cyrus at lists.andrew.cmu.edu>
> Sent: Saturday, October 19, 2002 8:21 AM
> Subject: Re: Distributed File Systems
>
> -- David Chait <davidc at bonair.stanford.edu> is rumored to have mumbled on
> Freitag, 18. Oktober 2002 23:23 Uhr -0700 regarding Distributed File
> Systems:
>
> Hi,
>
> >     Has anyone here looked into or had experience with Distributed File
> > Systems (AFS, NFS, CODA, etc) applied to mail partitions to allow for
> > clusetering or fail over capability of Cyrus IMAP machines? I have seen
> > docs for splitting the accounts between machines, however this doesn't
> > seem like the best fault tollerant solution.
>
> distributed file systems don't work. Look here for a different approach:
>
> <http://asg.web.cmu.edu/archive/message.php?mailbox=archive.info-cyrus&msg=
> 17132>
> --
> Sebastian Hagedorn M.A. - RZKR-R1 (Flachbau), Zi. 18, Robert-Koch-Str. 10
> Zentrum für angewandte Informatik - Universitätsweiter Service RRZK
> Universität zu Köln / Cologne University - Tel. +49-221-478-5587
>