painful mupdate syncs between front-ends and database server
dpc22 at cam.ac.uk
Sat Oct 31 06:02:13 EDT 2009
On Fri, 30 Oct 2009, Michael Bacon wrote:
> On all systems in the murder, we'll see instances where the mupdate
> process goes into a spin where, in truss, it's an endless repeat of
> fcntl, stat, fstat, fcntl, thousands of times over. These execute
> extremely quickly, but I do wonder if we're assuming that something that
> takes very little time takes an insignificant amount of time, when the
> time involved becomes significant on an 800k mailboxes database.
I agree that latency is probably your problem here.
I'm wondering if fsync() latency on the frontends might be a factor given
that you report little disk I/O on the mupdate master (IOPS are much more
important than Kps, but I'm sure that you already know that). The update
process will only be as fast as its weakest link, and you stated earlier:
> When we spec'ed out our servers, we didn't put much I/O capacity into
> the front-end servers -- just a pair of mirrored 10k disks doing the OS,
> the logging, the mailboxes.db, and all the webmail action going on in
> another solaris zone on the same hardware.
No mention of battery backed write cache there, which tends to be fairly
critical for anything involving fsync(). There is an easy way to find out:
If enabled, this option forces the skiplist cyrusdb backend to not
sync writes to the disk. Enabling this option is NOT RECOMMENDED.
You can ignore the scary warning (at least for test purposes) on murder
frontends, given that it is just a readonly replica of the mupdate master.
I hope that this isn't a complete red herring. It just struck me that it
would be a really easy test to make.
David Carter Email: David.Carter at ucs.cam.ac.uk
University Computing Service, Phone: (01223) 334502
New Museums Site, Pembroke Street, Fax: (01223) 334679
Cambridge UK. CB2 3QH.
More information about the Info-cyrus