High-Availability IMAP server
David Carter
dpc22 at cam.ac.uk
Wed Sep 28 04:02:43 EDT 2005
On Tue, 27 Sep 2005, Patrick Radtke wrote:
> We made great use of it Monday morning when one of our backend machines
> failed. Switching to the replica was quite simple and relatively fast
> (maybe 5 to 10 minutes from deciding to switch to the replica before
> replica was fully in action)
We use the replication engine all the time to move users back and forth
between systems so that we can patch and upgrade operating systems and/or
Cyrus without any user visible downtime.
There have also been a number of forced failovers because of hardware
problems, specifically some dodgy RAID controller firmware that we were
running for a few months until we got a fix. Its worked very nicely for
us, but it is important that people don't just trust the software blindly.
We maintain and constantly regenerate a database of MD5 checksums for all
of the messages and cache entries across the cluster. Its been a long time
now since this has turned up errors, but I still check it religiously.
> I consider the code to stable, though on occasion strange things happen
Which is not really my definition of stable :).
> (e.g. when user renames user.INBOX to user.saved.INBOX) and you have to
> restart the replication process (no downtime to Cyrus involved).
This one is odd behaviour on the part of mboxlist_renamemailbox(): it does
special magic when running as a non-admin user. There's actually a more
serious underlying bug in Cyrus here which I believe Ken is working on.
Again we don't see this one. Partly because our replication engine doesn't
run as an admin user (afraid you don't have that option), partly because
of overenthusiastic hacking on my part in other parts of the Cyrus code.
--
David Carter Email: David.Carter at ucs.cam.ac.uk
University Computing Service, Phone: (01223) 334502
New Museums Site, Pembroke Street, Fax: (01223) 334679
Cambridge UK. CB2 3QH.
More information about the Info-cyrus
mailing list