FastMail.FM GUID upgrade process

Bron Gondwana brong at
Mon Oct 29 00:23:06 EDT 2007

Ok - we're doing our GUID upgrades across the board now.  Here's
the process we're using:

a) wrote a tool that tied a "DB_File" called "cyrus.sha1s" in each
   meta directory on the replicas, parsed the index files looking
   for records with nothing but zeros in the last 16 characters of
   the GUID and calculated the sha1 on each of them.
   This took about 5 days to finish running, but makes the next
   part a lot quicker!

b) wrote a daemon which runs on each host and allows the following
   4 commands:

   *) LOCK
     - lock cyrus.header, cyrus.index, cyrus.expunge in that order
       using fcntl (our cyrus is build with it).
       Also if cyrus.sha1s is missing, attempt to fetch it from the
       replica (but it's OK if this fails, just means all old GUIDs
       will cause a re-calculation)
     - parse each of cyrus.index and cyrus.expunge.  If any old-style
       GUIDs are found, then looks first in cyrus.sha1s and finally
       just re-calculates from the underlying message files.
     - if any index records need new GUIDs or the old index has not
       yet been upgraded to version 10, stream the index file thorugh
       a Cyrus::IndexFile->stream_copy, altering the necessary GUIDs
       and forcing the output format to version 10 (this module can 
       also be used to downgrade if we ever need to!)
     - leave the new file in cyrus.$item.NEW, but mark internally
       that the file has been upgraded.
     - if any file has been upgraded, unlink() the .NEW file.
     - unlock expunge, index, header (in that order)
   *) COMMIT
     - if any file has been upgraded, rename() the .NEW to the 
       base filename.
     - unlock expunge, index, header (in that order)

c) wrote a controller script which reads the mailbox listing from the
   master and opens connections to both the master and replica slotd,
   sending the following commands:

   1) master LOCK mailbox (or die)
   2) replica LOCK mailbox (or master ROLLBACK; die)
   3) master UPGRADE (or replica ROLLBACK; master ROLLBACK; die)
   4) replica UPGRADE (or replica ROLLBACK; master ROLLBACK; die)
   5) replica COMMIT (or replica ROLLBACK; master ROLLBACK; die)
   6) master COMMIT (or master ROLLBACK; die NOISILY!!!)

   The only danger point is (6), where you could wind up with an
   upgraded replica without the associated upgraded master.  You
   can go ahead and fix them by hand though, assuming you read the
   NOISILY bit.

d) I think the slashdot crowd would put "Profit!!!" here.  With
   only a short lock time on each index (most sha1s precalculated)
   and no need to multi-rewrite any index file, this will run much
   faster than the alternatives.  I guess I should go clean up the
   cyrus.sha1s files once it's all finished.


More information about the Cyrus-devel mailing list