painful mupdate syncs between front-ends and database server

Cyril Servant elfejoyeux at gmail.com
Tue Oct 20 06:13:05 EDT 2009


Hello,

2009/10/19 Michael Bacon <baconm at email.unc.edu>:
> Hello, list,
>
> Today we're enjoying our first full work day of independence from the old
> monolithic cyrus server installed in 1999 (Sun 6800 -- it's had new CPU
> boards since then, but that's it), and on our new shiny cluster of T5220's
> that are mostly happily operating as a murder.
>
> I say mostly because while most of the times the thing handles our 80,000
> users and 14,000+ simultaneous connections like a champ, some of the time,
> we get some extreme pain, mostly due to syncs between the MUPDATE master
> and the front-end servers.
>
> When we spec'ed out our servers, we didn't put much I/O capacity into the
> front-end servers -- just a pair of mirrored 10k disks doing the OS, the
> logging, the mailboxes.db, and all the webmail action going on in another
> solaris zone on the same hardware.  We thought this was sufficient given
> the fact that no real permanent data lives on these servers, but it turns
> out that while most of thie time it's fine, if the mupdate processes ever
> decide they need to re-sync with the master, we've got 6 minutes of trouble
> ahead while it downloads and stores the 800k entries in the mailboxes.db.
>
> During these sync periods, we see two negative impacts.  The first is
> lockup on the mailboxes.db on the front-end servers, which slows down both
> accepting new IMAP/POP connections and the reception of incoming messages.
> (The front-ends also accept LMTP connections from a separate pair of
> queueing hosts, then proxy those to the back-ends.)  The second is that,
> because the front-ends go into a
>
> It's awfully frustrating that a system that, as my boss says, performs like
> a Camaro most of the times until you hit a little rock in the road, and it
> suddenly turns into a Pinto.  It's also frustrating that this seems like
> one of the less complicated aspects of the system -- publishing replicas of
> a read-only database to a few worker boxes.
>
> I suppose this is Fastmail and others ripped out the proxyd's and replaced
> them with nginx or perdition.  Currently we still support GSSAPI as an auth
> mechanism, which kept me from going that direction, but given the problems
> we're seeing, I'd be open to architectural suggestions on either how to tie
> perdition or nginx to the MUPDATE master (because we don't have the
> back-ends split along any discernable lines at this point), or suggestions
> on how to make the master-to-frontend propagation faster or less painful.
>
> Sorry for the long message, but it's not a simple problem we're fighting.
>
> Michael Bacon
> UNC Chapel Hill
> ----
> Cyrus Home Page: http://cyrusimap.web.cmu.edu/
> Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
> List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
>

Here we had a similar situation : more than a million mailboxes, and
each MUPDATE sync was veeeeery long (when it succeeded). Now, we
bypass the problem : we get rid of the MUPDATE (and the skiplist
mailboxes.db). We use a home made mysql backend for mailboxes. We
added write and read filters to this backend so front-end and back-end
servers get the right value from mysql.

With this configuration, we're no more in murder mode, we just use
front-end cyrus (proxys), back-end cyrus, and mysql. We don't need
MUPDATE any more, so we have no sync problems. Cyrus restarts are
fast.

-- 
Cyril


More information about the Info-cyrus mailing list