Cyrus 2.4 replication bug, plus a couple of questions
dpc22 at cam.ac.uk
Sun Nov 28 13:04:20 EST 2010
I've set up a little Cyrus-2.4 testrig and moved my own email to it to see
what breaks. So far so good, apart from a little bug in the 2.4
replication code, as follows:
sync_server.c: do_mailbox() gets thoroughly confused if a mailbox on the
master has index_records which do not appear in the replica mailbox.
An unwanted recno++ means that the two ends get out of sync and
sync_server starts to compare the wrong messages.
Trivial patch attached. I can report this in bugzilla as well if that is
the desired procedure, but this looks like the obvious fix.
I imagine that the common case where this would happen is where someone
had been running replication in 2.3 with different sets of expunged
messages at the two ends. Either because only one end had been running
with delayed expunge enabled, or different cyr_expire configurations. Or
in my case, a freshly generated 2.3 replica without any expunged messages.
Curiously while the first attempt at "sync_client -u" bailed out with a
IMAP_MAILBOX_CRC, a second attempt worked after a bit of ping pong with
highestmodseq and the modseqs on individual message. This lead to an
partial upload list without any of the expunged messages from the master.
Quick Question about cyr_expire in 2.4
The replication engine in 2.4 tries to track expunged messages. Does this
mean that it is no longer safe to run cyr_expire on replica servers?
I'm a bit concerned that this would stop us from running master/master,
with a subset of accounts running on each server in a pair.
cyrus-cvs at lists.andrew.cmu.edu dried up a few months back. Is there an
equivalent list for commits into git://git.cyrusimap.org/cyrus-imapd/?
This might be a naive question: I only have limited experience of git.
cyrus-cvs, but no replacement list.
A thank you to Bron
The mailbox abstraction code in 2.4 is much nicer to work with than the
low level hackery which used to go on in earlier versions of Cyrus.
I managed to port by own (2003 era) replication code and other local
changes in about 3 days. Plus one day to work out what was going on with
the new FLAG_SEEN stuff in 2.4 and a few hours today working out why the
normal 2.4 replication code was having a fit (see above).
mailbox_rewrite_index_record() and mailbox_append_index_record() probably
generate a little more disk I/O than what was there before as they can
only work with a single message at a time. But if Fastmail don't care,
then I would be very surprised if I notice.
I'm a little concerned that the new replication engine doesn't appear to
be able to cope with messages reappearing in the middle of a mailbox, but
That Should Never Happen. The simplicity of 2.4 has a lot going for it
(says the guy who wrote sync_combine_commit() plus sync_append_commit(),
and probably quite a lot of the other really nasty convolved code in 2.3).
The replication code in 2.4 is much nicer and looks like something that I
can use in production, which was never the case with the code in 2.3. The
giant central lock was always a show stopper for me. The only thing that I
have interfered with is to run the replication protocol over SSH links so
that I can use SSH key based authentication rather than adding sync_host,
sync_authname and sync_password (ugh!) into configuration files. That's a
relatively modest bit of replumbing of the various prot channels: would
there be any interesting in turning this into a more general patch?
David Carter Email: David.Carter at ucs.cam.ac.uk
University Computing Service, Phone: (01223) 334502
New Museums Site, Pembroke Street, Fax: (01223) 334679
Cambridge UK. CB2 3QH.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 847 bytes
Url : http://lists.andrew.cmu.edu/pipermail/cyrus-devel/attachments/20101128/d18968b8/attachment.bin
More information about the Cyrus-devel