painful mupdate syncs between front-ends and database server

Mon Oct 19 21:54:45 EDT 2009

On Mon, 19 Oct 2009 16:38 -0400, "Michael Bacon" <baconm at email.unc.edu> wrote:
> When we spec'ed out our servers, we didn't put much I/O capacity into the 
> front-end servers -- just a pair of mirrored 10k disks doing the OS, the 
> logging, the mailboxes.db, and all the webmail action going on in another 
> solaris zone on the same hardware.  We thought this was sufficient given 
> the fact that no real permanent data lives on these servers, but it turns 
> out that while most of thie time it's fine, if the mupdate processes ever 
> decide they need to re-sync with the master, we've got 6 minutes of
> trouble 
> ahead while it downloads and stores the 800k entries in the mailboxes.db.

Have you checked if it's actually IO limited?  Reading the code, it appears
to do the entire sync in a single transaction, which is bad because it locks
the entire mailboxes.db for the entire time.

> During these sync periods, we see two negative impacts.  The first is 
> lockup on the mailboxes.db on the front-end servers, which slows down
> both 
> accepting new IMAP/POP connections and the reception of incoming
> messages. 
> (The front-ends also accept LMTP connections from a separate pair of 
> queueing hosts, then proxy those to the back-ends.)  The second is that, 
> because the front-ends go into a

Lost you there - I'm assuming it causes a nasty load spike when it finishes
too.  Makes sense.

> I suppose this is Fastmail and others ripped out the proxyd's and
> replaced 
> them with nginx or perdition.  Currently we still support GSSAPI as an
> auth 
> mechanism, which kept me from going that direction, but given the
> problems 
> we're seeing, I'd be open to architectural suggestions on either how to
> tie 
> perdition or nginx to the MUPDATE master (because we don't have the 
> back-ends split along any discernable lines at this point), or
> suggestions 
> on how to make the master-to-frontend propagation faster or less painful.

We didn't ever go with murder.  All our backends are totally independent.

> Sorry for the long message, but it's not a simple problem we're fighting.

No - it's not!  I wonder if a better approach would be to batch the mailboxes.db
updates into groups of no more than (say) 256.

Arrgh - stupid, stupid, stupid.  Layers of abstraction mean we have a nice
fast "foreach" going on, and then throw away the data and dataptr fields,
followed by which we fetch the data field again.  It's very inefficient.  I wonder
what percentage of the time is just reading stuff from the mailboxes.db?

Anyway - the bit that's actually going to be blocking you will be the mailboxes.db
transactions.  I've attached a patch.  Advance warning - I don't use murder, so I
haven't done more than compile test it!  It SHOULD be safe though, it just
commits to the mailboxes.db every 256 changes and then closes the transaction,
which means that things that were queued waiting for the lock should get a chance
to run before you update the next 256 records.

The patch is against current CVS (well, against my git clone of current CVS anyway)

Bron.
-- 
  Bron Gondwana
  brong at fastmail.fm

-------------- next part --------------
A non-text attachment was scrubbed...
Name: mupdate_transctions.diff
Type: application/octet-stream
Size: 2253 bytes
Desc: not available
Url : http://lists.andrew.cmu.edu/pipermail/info-cyrus/attachments/20091020/de380d87/attachment.obj