painful mupdate syncs between front-ends and database server
Michael Bacon
baconm at email.unc.edu
Fri Oct 30 15:24:25 EDT 2009
I apologize for not responding sooner here. I've had my head down in the
code and doing some tests, including playing with Bron's patch here.
I haven't had the guts to roll the patched, CVS version into production as
our primary mupdate server, but I did put it in on a test machine in
replica mode. My measurement was on a clean server (no pre-existing
mailboxes.db), and it didn't appear noticeably faster. I haven't measured
hard numbers, but it was still well over 10 minutes to complete the sync
and write it out to disk.
The odd thing is that we see major performance differences depending on
what disk the client is living on. For instance, if we put the
mailboxes.db (and the whole metapartition) on superfast Hitachi disks over
a 4 GB SAN connection, the sync will finish in just under three minutes.
Still, even though we see that big difference, we don't see any kind of I/O
contention in the iostat output. the k/sec figures are well within what
the drives should be able to handle, and the % blocking stays in low single
digits most of the time, while peeking up in the 15-25 range from time to
time, but not staying there. It does make me wonder if what we're seeing
is related to I/O latency.
I haven't delved deep into the skiplist code, but I almost wonder if at
least some of the slowness is the foreach iteration on the mupdate master
in read mode. On all systems in the murder, we'll see instances where the
mupdate process goes into a spin where, in truss, it's an endless repeat of
fcntl, stat, fstat, fcntl, thousands of times over. These execute
extremely quickly, but I do wonder if we're assuming that something that
takes very little time takes an insignificant amount of time, when the time
involved becomes significant on an 800k mailboxes database.
Finally, as to how we get into this situation in the first place, it
appears to happen when the mupdate master, in our environment and
configuration, can handle having up to three replicas connected to it
before it goes into a bad state during high load. I've never caught it at
the point of actually going downhill, but my impression is that so many
processes start demanding responses from the mupdate server that the
persistent connections that the slave mupdates have to the master timeout
and disconnect, then reconnect and try to re-sync. (At least that's what
it looks like in the logs.) Incoming IMAP connections won't do it, but
lmtpproxy connections seem to have a knack for it, since for whatever
reason they appear to generate "kicks" at a pretty high rate.
Still looking, but open to suggestions here.
Michael Bacon
UNC Chapel Hill
--On October 20, 2009 12:54:45 PM +1100 Bron Gondwana <brong at fastmail.fm>
wrote:
>
>
> On Mon, 19 Oct 2009 16:38 -0400, "Michael Bacon" <baconm at email.unc.edu>
> wrote:
>> When we spec'ed out our servers, we didn't put much I/O capacity into
>> the front-end servers -- just a pair of mirrored 10k disks doing the
>> OS, the logging, the mailboxes.db, and all the webmail action going on
>> in another solaris zone on the same hardware. We thought this was
>> sufficient given the fact that no real permanent data lives on these
>> servers, but it turns out that while most of thie time it's fine, if
>> the mupdate processes ever decide they need to re-sync with the master,
>> we've got 6 minutes of trouble
>> ahead while it downloads and stores the 800k entries in the mailboxes.db.
>
> Have you checked if it's actually IO limited? Reading the code, it
> appears to do the entire sync in a single transaction, which is bad
> because it locks the entire mailboxes.db for the entire time.
>
>> During these sync periods, we see two negative impacts. The first is
>> lockup on the mailboxes.db on the front-end servers, which slows down
>> both
>> accepting new IMAP/POP connections and the reception of incoming
>> messages.
>> (The front-ends also accept LMTP connections from a separate pair of
>> queueing hosts, then proxy those to the back-ends.) The second is that,
>> because the front-ends go into a
>
> Lost you there - I'm assuming it causes a nasty load spike when it
> finishes too. Makes sense.
>
>> I suppose this is Fastmail and others ripped out the proxyd's and
>> replaced
>> them with nginx or perdition. Currently we still support GSSAPI as an
>> auth
>> mechanism, which kept me from going that direction, but given the
>> problems
>> we're seeing, I'd be open to architectural suggestions on either how to
>> tie
>> perdition or nginx to the MUPDATE master (because we don't have the
>> back-ends split along any discernable lines at this point), or
>> suggestions
>> on how to make the master-to-frontend propagation faster or less painful.
>
> We didn't ever go with murder. All our backends are totally independent.
>
>> Sorry for the long message, but it's not a simple problem we're fighting.
>
> No - it's not! I wonder if a better approach would be to batch the
> mailboxes.db updates into groups of no more than (say) 256.
>
> Arrgh - stupid, stupid, stupid. Layers of abstraction mean we have a nice
> fast "foreach" going on, and then throw away the data and dataptr fields,
> followed by which we fetch the data field again. It's very inefficient.
> I wonder what percentage of the time is just reading stuff from the
> mailboxes.db?
>
> Anyway - the bit that's actually going to be blocking you will be the
> mailboxes.db transactions. I've attached a patch. Advance warning - I
> don't use murder, so I haven't done more than compile test it! It SHOULD
> be safe though, it just commits to the mailboxes.db every 256 changes and
> then closes the transaction, which means that things that were queued
> waiting for the lock should get a chance to run before you update the
> next 256 records.
>
> The patch is against current CVS (well, against my git clone of current
> CVS anyway)
>
> Bron.
> --
> Bron Gondwana
> brong at fastmail.fm
>
More information about the Info-cyrus
mailing list