cyrus replication : master to replica and replica to master
Robert Mueller (web)
robm at fastmail.fm
Thu Oct 22 05:18:18 EDT 2009
> What are the particular bits that could conflict and have undesirable
> results? Metadata, messages, entire mailboxes? In this hypothetical
> active/active configuration, what exactly what could an IMAP client
> potentially do to create undesirable results?
Simple.
Client A: upload message to Inbox, gets UID 100
At the same time, Client B: upload message to Inbox, gets UID 100
You can't have two messages with the same UID.
There's 3 solutions I can see:
1. Mysql solves this by having interleving id's on separate servers (eg.
auto-increment column on server A is odd numbers, on server B it's
even numbers). I guess you could in theory do the same with IMAP
(though I'd have to double check the spec), but it would create
really annoying UID lists because you basically lose the ability to
use things like 30:50. One other option would be to alternate in
100's or something like that (eg. 1-100 on s1, 101-200 on s2, etc)
2. Use global locking so anything allocating UIDs gets a cross-server
lock, allocates the UIDs, and keeps the global UID counter somewhere.
This gets tricky to deal with the case where one server goes down,
you need to handle that case well (eg the locking server has to know
the difference between "down" vs "unreachable" so you don't get
split brain)
3. Use some conflict resolution strategy. If some client uploads UID 100
on s1, and another uploads UID 100 on s2, then when the conflict is
noticed, both sides have to delete + expunge the message (because
different IMAP clients might have different ideas on what message UID
100 is) and create new UIDs 101 and 102 with the two messages. This
can be messy because if a POP client is connected, you can't alter
the mailbox at all because the message list isn't allowed to change
under the POP clients feet, so connected POP clients could cause
nasty locking issues.
> Would it be a huge undertaking to timestamp data that is to be
> replicated to another Cyrus daemon for the receiving Cyrus daemon to
> know which conflicting pieces of data to drop in favor of newer data?
It's certainly not trivial, and getting every edge case and race
condition right is going to be hard.
> Right now I have a client who needs 130 or so users on a private mail
> server and has two cheap 1U Dell servers to work with. Ideally they
> are to be put in physically distanced data-centers for redundancy to
> one another.
That's what replication is for, but you will have to use some manual
failover strategy
> Combined with the hypothetical replication of timestamped data
> describe above, wouldn't setting the fqdn imap.example.com to resolve
> two IP addresses so users' IMAP clients can fall back should an IMAP
> storage server be unavailable (with at least the most recent data
> replication of any kind is able to provide) make for a much simpler
> and more elegant solution than DRBD, clustered filesystems, or
> introducing more machines just for load balancing / resolving to an
> available IMAP daemon? Also, wouldn't timestamps also hypothetically
> resolve the inevitable split-brain situations clients would create?
Wonderful in theory, but hard to implement in practice.
Rob
More information about the Info-cyrus
mailing list