Making Replication Robust
robm at fastmail.fm
Tue Oct 9 19:50:14 EDT 2007
>> c) MUST have a clean process to "soft-failover" to the
>> replica machine, making sure that all replication
>> events from the ex-master have been synchronised.
> Something more than sync_shutdown_file plus automatic retries on
> recent work files?
I think the problem at the moment is that the process you really want is:
1. Stop new imap/pop/lmtp/sieve/etc connections
2. Finish and close existing connections cleanly but as quickly as possible
3. Finish running any sync log files
4. Fully shutdown
There's currently no clean way to do this. Basically you have to SIGTERM
master which hard kills it and all children, then manually run
sync_client -f on any remaining log files.
We've got a patch which makes master handle SIGQUIT much more nicely.
Basically it appears there was some existing infrastructure that was
designed to handle a cleaner shutdown, look at the code to all the places
that call signals_poll(). It looks like the idea was that you could send
child processes SIGQUIT and they would continue their current action until
their "main loop" and check if they'd been sent a QUIT, and then exit
cleanly. Unfortunately if you sent SIGQUIT to master, it would just SIGTERM
all children, not SIGQUIT them.
This patch attempts to fix this, so that sending SIGQUIT to master, sends
SIGQUIT to all children, and then waits for them to all exit cleanly.
This solves step 1 & 2 above, though it doesn't deal with the case of a
"crazy child" that doesn't respond to SIGQUIT. Personally our init script
sends SIGQUIT, and if the master process is still there after 10 seconds,
then it sends SIGTERM to force and exit. In general we find that everything
exits after a couple of seconds of SIGQUIT.
To do step 3, I think the best might be to have a new cyrus.conf section, a
SHUTDOWN section which gives some commands to run on shutdown. Basically
after all children have accepted a SIGQUIT and exited, then we run the
SHUTDOWN section, which would run a final sync_client -r on the sync dir to
finish up any remaining log files.
With all of that in place, it means you could send a SIGQUIT to a cyrus
master process on a master server, and it would cleanly shutdown all
children and ensure that all replication events have been correctly played
to the replica. You could then do the same to the replica, then reverse
their roles, and bring them both back up and you've got a safe soft
> At the moment we replace messages (on the "master knows best" principle).
> It would be easy enough to leave message in place and generate warnings
> instead, although this would generate a lot of warnings, one for every bad
> message every time that a given mailbox is updated.
That's what this patch does.
In theory with clean soft failovers, you should NEVER have UIDs with
mismatched UUIDs. After a hard failover, you obviously might, but in those
cases, just replacing the message means we're almost certainly overwriting a
delivered message and loosing it which is bad. At least making it an option
to overwrite or log I think is a sane idea.
> My nightmare scenario is a replication engine which carries on running in
> the face of mboxlist corruption on the master: you could lose a lot of
> mailboxes on the replica that way.
That would be bad, though hard to detect and stop. I guess that's what
backups are for...
> It would be easy enough to generate multiple replication log files.
> MySQL keeps a single transaction log for multiple replicas, but that file
> contains quite a lot of information about each transaction. In contrast
> the Cyrus sync log is just a list of objects we need to pay attention to:
> the files have much less state, particularly without duplicates.
The other option is rather than using the "rotate log, play it, delete it"
system, you generate one log file but you keep track of "offsets" within the
file to tell you where each replica is up to. That's what mysql does, so you
can have multiple replicas because each replica is "playing" off the same
log files, they're just up to different offsets at any point in time.
More information about the Cyrus-devel