cyrus replication : master to replica and replica to master

Thu Oct 22 05:07:04 EDT 2009

On Thu, Oct 22, 2009 at 12:56:03AM -0700, Jon . wrote:
> On Wed, Oct 21, 2009 at 9:20 PM, Rob Mueller <robm at fastmail.fm> wrote:
> ...
> 
> > The difference between "in theory this would work" and the practice of
> > actually doing it are huge. Basically it works only if you are 100% sure
> > that only one side is ever being accessed at a time. eg. IMAP/POP/LMTP/etc.

Pretty much.  With appropriate fencing, non-local bind and a service IP
address that's feasible.  But Rob won't let me do it.  Fair enough too,
it's pretty messy.

> ...
> 
> > In other words, DON'T DO THIS.
> >
> > Rob

Yeah, yeah.  I know.  I could have worded it a bit more strongly.  "Nobody's
ever done it because it's really tricky to get right and you'll lose data
for sure if you don't know what you are doing".

> What are the particular bits that could conflict and have undesirable
> results? Metadata, messages, entire mailboxes? In this hypothetical
> active/active configuration, what exactly what could an IMAP client
> potentially do to create undesirable results?

Yes.  Those things.  Any and/or all.  Try thinking about a folder rename
at one end and a copy/expunge cycle between folders at the other end and
resolving the resultant mess.

Basically this is tricky stuff that nobody does particularly well.  Generic
sync is a hard problem[tm], and the Cyrus code doesn't even try.

In particular, it doesn't track deltas.  To get even halfway good tracking
of changes, you need three things:

1) current state of A
2) current state of B
3) state last time A and B were in sync

even better is knowing the changes that were made and resolving them.  But
without even this much information, consider the following.

A: UID 5 is SEEN
b: UID 5 is UNSEEN

what should be the result?

> Would it be a huge undertaking to timestamp data that is to be
> replicated to another Cyrus daemon for the receiving Cyrus daemon to
> know which conflicting pieces of data to drop in favor of newer data?

Timestamp each piece of metadata individually, yes - it would be a huge
undertaking.

> Right now I have a client who needs 130 or so users on a private mail
> server and has two cheap 1U Dell servers to work with. Ideally they
> are to be put in physically distanced data-centers for redundancy to
> one another.
>
> Combined with the hypothetical replication of timestamped data
> describe above, wouldn't setting the fqdn imap.example.com to resolve
> two IP addresses so users' IMAP clients can fall back should an IMAP
> storage server be unavailable (with at least the most recent data
> replication of any kind is able to provide) make for a much simpler
> and more elegant solution than DRBD, clustered filesystems, or
> introducing more machines just for load balancing / resolving to an
> available IMAP daemon? Also, wouldn't timestamps also hypothetically
> resolve the inevitable split-brain situations clients would create?

I assume they don't like losing messages.  If you really, REALLY want
to go down that path I would at least take FastMail's patch that checks
the GUID if the same message exists on both ends and refuses to overwrite
if the message contents differ.  This is half a solution, you then need
to resolve the issue (we LOCALDELETE the original message at both ends so
it doesn't even wind up in the .expunge file, then we append BOTH messages
with brand new UIDs and set the flags they used to have on the master - 
finally syncing the resulting mailbox again so both messages are on both
ends - but the code for that isn't in our Cyrus patches - it's a standalone
script)

And that's just for the split brain that results when a machine dies for
whatever reason (it happened last night incidentally - one of our external
RAID units had an "episode" and decided to stop talking to the server.  It
looks like a couple of timeouts on a failing drive tickled a firmware bug
and resulted in the inbuilt OS locking up.  Software, you have to love it.
Embedded at all layers.  So many firmwares to keep up-to-date!) - we don't
have multi-directional replication.

Our approach to utilisation handling has been documented here plenty of
times.  Basically we run multiple instances of Cyrus on each machine, so
every server has both masters and replicas.  We can shut down any one
machine just by switching roles (a shutdown and restart of each end with
new configs)

Regards,

Bron.