replication

Michael Menge michael.menge at zdv.uni-tuebingen.de
Tue Nov 16 05:47:02 EST 2010


Quoting Shuvam Misra <shuvam.misra at merceworld.com>:

>> Quoting Bron Gondwana <brong at fastmail.fm>:
>> >
>> > It's getting better, but it's still not 100% reliable to have
>> > master/master replication between two servers with interactions
>> > going to both sides.
>> >
>> > It SHOULD be safe now to have a single master/master setup with
>> > individual users on one side or the other - but note that nobody
>> > is known to be running that setup successfully yet.
>> >
>> > As for what the point is?  I don't know about you, but I run a
>> > 24hr/day shop, and I like to be able to take a server down for
>> > maintainence in about 2 minutes, with users seeing a brief
>> > disconnection and then being able to keep using the service
>> > with minimal disruption.
>> >
>> > Bron.
>>
>> As Bron already mentioned the problems of master/master mode
>> you can easy live without.
>>
>> We run multiple servers, these are paired, each server is running one
>> cyrus instance in as master and one as slave, so that the pairs
>> replicate each other. In case of a crash one server would run two
>> master instances.
>>
>> You only need a way of splitting the users between the  servers.
>> That could be DNS, a proxy or murder setup.
>
> Are you using local storage on each server for spool and metadata?

We have all cyrus storage on iSCSI-Systems

> How good/bad is the idea of using shared storage (an external SAN
> chassis) and letting multiple servers keep their spool areas there? Can
> one set up, say, half a dozen servers in a pool, each using a separate
> LUN for spool+data on a common back-end SAN chassis? Out of the six
> servers, one would be a hot spare, standing by. If any of the five active
> servers failed, the standby would be told to mount the failed server's
> LUN, borrow the failed server's IP address, and start offering services?
>

That would work, but you would still have a single point of failure
if the SAN system chraches or if the filesystem of one backend gets
corrupted.

We have 6 Servers and 2 independent iSCSI-Systems. Each iSCSI-System
holds 3 partitions for active servers and 3 partiotins for replications.

> In this proposed model, each user's account is on one "physical" server
> (i.e. bound to a specific IP address). No load balancing or connection
> spreading is needed when clients connect. If the site chooses to use
> Murder, then the proposed model can apply to the back-end while the
> multiplexer can take care of the front-end.
>
> The only thing I'm not sure about is the file system corruption when a
> node goes down and the time taken for an fsck before the standby node can
> assume the failed node's role. I wonder whether something like the ext4
> will help reduce fsck timings to acceptable levels.

The time checking is one thing, but if you lose data in one partition
you have a problem. Restoring files from filebased backup is a pain
if you have many small files like cyrus has.

>
> Is this a good idea for a scalable fault-tolerant Cyrus setup? I've been
> toying with this approach for some time, for a proposed large-system design.
>

We are testing cyrus murder to ease the work of switching to a replication
and back.



--------------------------------------------------------------------------------
M.Menge                                Tel.: (49) 7071/29-70316
Universität Tübingen                   Fax.: (49) 7071/29-5912
Zentrum für Datenverarbeitung          mail:  
michael.menge at zdv.uni-tuebingen.de
Wächterstraße 76
72074 Tübingen
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5267 bytes
Desc: S/MIME Signatur
Url : http://lists.andrew.cmu.edu/pipermail/info-cyrus/attachments/20101116/0c4b5c57/attachment.bin 


More information about the Info-cyrus mailing list