painful mupdate syncs between front-ends and database server

Mon Oct 19 17:37:24 EDT 2009

--On October 19, 2009 2:13:03 PM -0700 Andrew Morgan <morgan at orst.edu> 
wrote:

> What is causing a (re)sync of the frontends?  Normally this should only
> happen when you start Cyrus on a frontend, right?

I am not entirely sure.  I think what may be happening is that the slave 
mupdate requests get some kind of timeout, and end up disconnecting.  As 
soon as they reconnect, they want to re-sync.  I've upped the 
"mupdate_retry_timeout" to 10 minutes, so most of the time, they'll only 
timeout once, then the next retry will be successful.  This solved a 
constant re-sync issue we had early on, but apparently hasn't solved the 
problem entirely.

> On Mon, 19 Oct 2009, Michael Bacon wrote:

<snip>

>> During these sync periods, we see two negative impacts.  The first is
>> lockup on the mailboxes.db on the front-end servers, which slows down
>> both accepting new IMAP/POP connections and the reception of incoming
>> messages. (The front-ends also accept LMTP connections from a separate
>> pair of queueing hosts, then proxy those to the back-ends.)  The second
>> is that, because the front-ends go into a
>
> A part of this paragraph was chopped off.  What else did you have to say?

Sorry, must have blanked on that.  The front-ends go into a sync cycle, 
which ties up the MUPDATE server while they download the database (which 
can take up over two minutes).  This causes a similar halt on anything that 
was responding to a mupdate "kick" on the clients, which appears to stop up 
a decent amount of inbound mail.

>
> I ran some tests back in 2006:
>
>> However, I just performed an interesting test comparing skiplist versus
>> berkeley.
>>
>> skiplist - approx 20-25mins
>> berkeley - 3mins
>>
>> Those are the times it took to push the entire mailbox list from our test
>> server to the mupdate master (146382 mailboxes).
>
> This was a test run populating the mupdate master mailboxes.db from a
> single backend server.  However, I think it illustrates the differences
> between skiplist and berkeley database formats.  In our case, we still
> went with skiplist, but it might be something to consider.

Interesting.  We're running skiplist everywhere, after some nasty 
experiences I've had with bdb, but that's a pretty astonishing performance 
difference.

I'm pretty sure we can solve the problem by adding additional I/O capacity 
to the mailboxes.db on the front-ends, but it's kind of frustrating that we 
have to.  I've considered putting those in a swap-mounted file system, but 
that makes me a bit nervous.

-Michael