Old replication logs

Bron Gondwana brong at fastmail.fm
Thu Aug 9 20:15:01 EDT 2007


On Thu, Aug 09, 2007 at 02:08:11PM +0300, Janne Peltonen wrote:
> Hi!
> 
> It appears that is a cyrus system is forcibly shut down, there is a
> replication log left (if the replica system wasn't up at the time). Now,
> is it safe to delete the log? What about the transactions that are in
> the log, is there a way to replay them later?  What if the system has
> been up and running for a while after the crash / forced shutdown? Is
> there a way to extract the mailboxes that have entries in the old
> logfile, to call sync_client by hand to make sure that all the mailboxes
> are up to date? Or would that be needed?
> 
> Whee.

Well, we run the attached perl script every 10 minutes on every machine
with a Cyrus instance on it.  It has hooks into our infrastructure all
over the place, but that's mainly because we run up to about 16 instance
of Cyrus (both masters and replicas) on a single host, so we need lots
of extra logic to figure out (a) what's supposed to be running, and
(b) which process and log files they are!

Anyway, the exciting bit is probably this:

if (opendir(my $DH, "$ConfDir/sync")) {
  while (my $item = readdir($DH)) {
    next unless $item =~ m/^log-(\d+)$/;
    my $pid = $1;

    # check if pid exists
    if (kill(0, $pid)) {
      next;
    }

    my $res = $Slot->RunCommand('sync_client', '-o', '-r', '-f' => "$ConfDir/sync/$item");

    # failure
    if ($? or $res =~ m/\S/) {
      # figure out what you want to do here...
    }

    # success :)
    else {
      unlink("$ConfDir/sync/$item");
    }
  }
}

NOTE: you can probably implement RunCommand directly in 
terms of system().  Ours is a bit complex because it puts:
"sudo -u cyrus /usr/cyrus/bin/"
in front of the command and 
"-C /etc/imapd-$SlotName.conf"
after it before passing it through to the ME::Machine version 
of RunCommand which does a fork, transparently (as much as 
possible) sshes to the correct machine if needed, does optional
per-line handling of responses with a callback function, etc.
Very powerful and easy interface, but very integrated in our
systems.

We also run "checkreplication" which actually makes a pair of
imap connections and enumerates through the mailboxes comparing
stuff, and another task which does a "du -s" on the sync
directory every 2 minutes and logs it to a database allowing our
status tools to inform us of any replications which are falling
behind (as well as emailed notifications).  Our failover script
also checks that DB value for both freshness and lowness before
it tries to fail over, and after shutting down the Cyrus master
it attempts to run all remaining lots and bails out if it can't.

Bron.


More information about the Info-cyrus mailing list