Today's pop quiz: replication

Mon Aug 3 19:02:46 EDT 2015

(sorry, started this while in the USA and forgot to come back to it)

On Thu, Jul 23, 2015, at 16:14, ellie timoney wrote:
> Here's my understanding, and my understanding is limited and probably
> incorrect, so I'd appreciate corrections from anyone who actually
> knows this stuff.
>
> > Do we have multiple sync_clients because a new one is spawned by a
> > master for each change (and then the process finishes), or is
> > there an actual pool of sync clients which handle each change and
> > persist idle.
>
> > I failed to note where the sync_server fits in. Is it a separate
> > process that lives alongside the master imap server and sits there
> > constantly checking the log files generated by sync_client to be
> > handled?
>
> I'm going to make up some terminology here because there's lots of
> reuses of the word "master" everywhere:

Making up different terminology than everyone else is using is bound to
help reduce confusion... ;)

> - the "primary" server for a given user/mailbox is the server which
>   that user actually interacts with

True.

> - a "replica" server for a given user/mailbox is another server which
>   contains a copy of their data

Now you're overusing the word "server".  Physical machine, or master
instance and associated child processes?

The replica is definitely a separate instance of cyrus, which at least
in FastMail is always on another physical server, but in some testing
layouts might be on the same server - for example I run a 3 instance
master and two replicas testcase on my laptop sometimes for testing.

> - there may be multiple replicas for a given user/mailbox, but only
>   one primary

true, at least for now.  There's some multi-master support in Cyrus
replication, but it's for split brains, not day-to-day running.  We now
have mailbox tombstones in mailboxes.db, but we don't use them at all in
the replication process.

> - a single cyrus instance may be the primary server for some users but
>   a replica server for other users

Absolutely not (see above about the brainsplits).  But - at FastMail...

> - there may even be multiple cyrus instances running on the same
>   physical machine (let's ignore this though)

Or maybe let's not - because this gave you the idea from the previous
line.  We run multiple instances at FastMail on a single server, and
some of them are masters and others are replicas. I've had 70+ instances
on a single machine.

We use the terminology server, store, slot, instance - and master,
replica, sync_client, sync_server.  There's no other meaning of master
than "the instance of Cyrus running on the slot to which user
connections are configured to go".

> A cyrus instance that is to be a replica server for some users needs
> to run the sync_server program*.  This listens for replication
> attempts from a primary server.

Yes.

> A cyrus instance that is to be a primary server for some users needs
> to run the sync_client program.  This generally runs in "rolling"
> mode, whereby it continuously processes the sync log** for changes to
> mailboxes and sends them to the sync_server on the replica.  If a
> primary is replicating to multiple replicas, it will generally
> multiple sync_clients, one for each.  It's also possible to chain
> replication (like primary -> replica_a -> replica_b) but let's ignore
> that too.

Happy to ignore that one, since I don't know anybody using it.

This is done with sync_channels.  Here's the config from store254
on FastMail:

sloti29t15_sync_host: 10.202.79.15 slotsi1d2t40_sync_host: 10.206.51.80
sync_log: 1 sync_log_channels: sloti29t15 slotsi1d2t40 squatter

> An administrator can also run sync_client manually -- e.g. for pre-
> populating a replica prior to starting rolling replication.

Yes, we have scripts to do this - scripts/cyrus_firstsync.pl is used to
populate new slots, using the list of valid users for that store from
the database, and tracking the state in a table so that we know when the
slot is ready.

Ideally (in my grand unified murder/replication world) this information
would be stored in the mailboxes.db of each machine, so that it could be
automated.

> The sync_server program will periodically be shut down and
> restarted by master (i.e. the process called master) -- I think
> there's some config for specifying how long one should hang around
> for before restarting.  I guess this protects a production service
> against possibly memory leaks. I'm not sure if this applies to
> sync_client too.

I'm not sure if this still happens or not.  There was a 'RESTART'
command in early sync_client/sync_server comms, and we still support it,
but I'd have to check the code to see if it still triggers.

> [* or have imapd configured to provide replication services, but let's
> ignore this too] [** there may be multiple of these depending on sync
> channel configurations, but let's ignore this too]

You can start sync_client from the START {} block of cyrus.conf, and
that's how we recommend to do it in the documentation, but at FastMail
it's actually started by the init script, and restarted by
scripts/monitorsync.pl if it ever goes away.  Both these things should
be absorbed into master, but we also need some way to add/remove
replicas on the fly.

To do that, we're probably going to need a single logging process which
gets sync_log events on a unix socket or something and writes them out
to all the $confdir/sync/$channel/log files.

> > Is the file format of the sync log defined anywhere? I assume it
> > correlates with a set of commands. (Not that this is important to a
> > user: it may as well be opaque, but it made me wonder!)
>
> I'm a bit confused about this myself.  Each time I go digging into the
> code my understanding flips back the opposite way.
>
> I think, either:
>
> * the sync log contains all the information needed to reproduce what's
>   happened (e.g. if a message has arrived, the sync log will contain
>   the message itself); OR
> * the sync log contains just enough to identify things that have
>   changed (e.g. if a message has arrived, the sync log contains a
>   message id of some sort), and the sync_client processing the log
>   just uses the log to discover which things to sync, but then uses
>   the actual mailbox to construct the changes to send to the replica.

There are a handful of things that can be there, easiest place to get
the set is by looking at imap/sync_log.h

> Either way I haven't seen any documentation on the sync log format.  I
> suspect it's either the raw sync protocol or some subset thereof?

It's not anything like that - it's just keyword, value - where value
is either a user or mailbox name, and keyword is the type of thing
which is dirty.

Sync client supports doing a single user with '-u', a single mailbox
with '-m', etc.  All the entries in the sync_log file map to are
triggers to replicate with that same value, so a line "USER
brong at brong.net" is the same as running sync_client -u brong at brong.net.

> > I also have in my wonderful drawing a picture of a number of
> > channels. I assume these are (part of) a config given to the
> > sync_server so it knows where to broadcast all the changes defined
> > in the log files to? Or have I misunderstood what a channel is?
>
> All of the above assumes a single default channel.  You can configure
> as many channels as you want.
>
> Each channel has a sync log.  When actions occur on mailboxes (e.g.
> via imapd, popd, lmtpd, sync_server, etc) the actions are logged to
> the sync log for all the channels.
>
> A single sync_client processes the sync log for a single channel.

Exactly this.  We do separate channels because one replica might be
down, and we don't want replication to ALL the replicas to be frozen
waiting for the others to catch up.  It's like a pipeline or queue -
entries can be logged until the replica is ready to accept them, and
then it syncs all of them.

> If you wanted primary to replicated simultaneously to replica_a and
> replica_b, you might set up a channel and corresponding sync_client
> for each replica.  (The other way to do it would be with chained
> replication
> i.e. where primary replicates to replica_a, and replica_a replicates
>      to replica_b -- but in this case if something went wrong with
>      replica_a, replica_b would get stale, which seems unideal.)

Yep.  The only real benefit to chaining is bandwidth use reduction - if
you have two replicas in a different datacentre, you can chain them and
avoid sending all the data over the link twice.  You can always re-
establish replication to the second replica by creating a direct channel
and running sync_client -A to make sure everything is up-to-date.

The nice thing about replication is that it's idempotent.  You can run
the same log multiple times.

> There's also a program called "squatter", which is used for updating
> search indexes.  It monitors a channel by the same name, and updates
> the search index for things that it sees change.

This is a little bit of a hack really :(  It works though.  It watches
for the "APPEND" sync_log item, which sync_client ignores.  We're only
using sync channels for squatter because they were there, and we don't
have another nice queuing mechanism.

> I've found the doc/install-replication.html document in the repo
> helpful for understanding how this stuff fits together, though it's
> lacking the deeper detail I actually need.

Sorry :(

> I started making a wiki page about this a few days ago but haven't
> updated since, and I've come to understand most of this since then.
> So maybe this email plus whatever corrections arrive from it would
> make good content for it: https://git.cyrus.foundation/w/replication/
>
> Hoping for confirmations/corrections,

Sounds like a plan!

Bron.

-- 
  Bron Gondwana
  brong at fastmail.fm