Replication is broken with modseq issue in 2.3.6

Bron Gondwana brong at fastmail.fm
Wed Jul 5 02:54:18 EDT 2006


If I sound a little bitter, it's because I was up until 5:30am the other
night after a hardware failure left us with corrupted filesystems on our
master server, and fetching old messages from the replica returned
blank responses.  We eventually discovered that reconstruct could fix
it, and 36(!!) hours later, all couple of terabytes of mailboxes have
finished reconstructing and our users have largely stopped yelling at
us.

Unsurprising to those of us who lived through 2.3.4 and 2.3.5, it was
related to the modseq field of the index file.  Yet more proof that
mixing unfinished new features with security updates in the 'stable'
line is bad for your blood pressure.

If you're running replication on any post 2.3.3 cyrus, your replica
contains indexes with '0' as the modseq value.  This means that your
messages will not be fetchable until you either fix that value or
patch Cyrus.  Reconstruct from 2.3.6+ is one option for fixing it.

otherwise ...

I've spent a fair bit of today tracking down and testing the location
in the code where that issue was being caused.  See the attached
'cyrus-modseqrepl-cvs.diff' file.

CAVEATS:
a) won't fix any already replicated messages
b) if you're using CONDSTORE, this won't replicate the actual modseq,
   it just sets it to '1' on the replica.  I considered trying to
   replicate the actual modseq value, but it looks like it requires
   changing the replication wire protocol, and that's a can of worms
   I'm really not interested in diving in to.  Someone who's actually
   using CONDSTORE is in a lot better place to see what's needed and
   actually test it!

That still sort of sucks, doesn't it.  Mainly due to (a).  


Then inspiration hit, and I wish I'd thought of this back when I wrote
the reconstruct and COPY patches that went into 2.3.6...

See the attached 'cyrus-modseqfetch-cvs.diff' file.  This implements the
correct behaviour in all cases, all PRO no CON - and it's only one line!

If fetchargs->changedsince is '0' then that means you want all messages
regardless of the value of modseq.

The other huge advantage of this approach is that it means that you
don't have to be aware of modseq unless you're using it.  The default
'0' value of a freshly zeroed struct is no longer an accident waiting
to happen.

Even folder indexes "corrupted" by 2.3.4-5 will work just fine with
this patch applied.  All your replicated messages will magically
appear again.

Replication and CONDSTORE is still broken - but then it was never 
unbroken (in that queries on the replica won't return the same as 
queries on the master if they use modseq).


Ken - please consider cyrus-modseqfetch-cvs.diff for immediate
inclusion and prompt release of a 2.3.7.

I imagine you're going to want to do more work so modseq values actually
replicate rather than using cyrus-modseqrepl-cvs.diff as is - though it 
certainly doesn't hurt any, and means newly replicated messages will be 
readable by older cyrus back to 2.3.4 (index format is too new past there).


Regards,

Bron.

-------------- next part --------------
diff -ur cyrus-imapd-cvs/imap/sync_commit.c cyrus-imapd-cvs.new/imap/sync_commit.c
--- cyrus-imapd-cvs/imap/sync_commit.c	2006-06-13 13:24:40.000000000 -0400
+++ cyrus-imapd-cvs.new/imap/sync_commit.c	2006-07-04 22:36:13.000000000 -0400
@@ -177,6 +177,8 @@
     for (n = mailbox->start_offset; n < INDEX_HEADER_SIZE; n++) {
         if (n == OFFSET_UIDVALIDITY+3) {
             putc(1, newindex);
+        } else if (n == OFFSET_HIGHESTMODSEQ+3) {
+            putc(1, newindex);
         } else {
             putc(0, newindex);
         }
@@ -233,6 +235,8 @@
 		= htonl(message->cache_version);
 
             message_uuid_pack(&item->uuid, buf+OFFSET_MESSAGE_UUID);
+            *((bit32 *)(buf+OFFSET_MODSEQ_64)) = 0;
+            *((bit32 *)(buf+OFFSET_MODSEQ)) = 1;
             quota_add  += message->msg_size;
 
             if (item->flags.system_flags & FLAG_ANSWERED) numansweredflag++;
-------------- next part --------------
diff -ur cyrus-imapd-cvs.orig/imap/index.c cyrus-imapd-cvs/imap/index.c
--- cyrus-imapd-cvs.orig/imap/index.c	2006-05-30 15:38:39.000000000 -0400
+++ cyrus-imapd-cvs/imap/index.c	2006-07-05 00:17:12.000000000 -0400
@@ -2427,7 +2427,7 @@
     int r = 0;
 
     /* Check the modseq against changedsince */
-    if (MODSEQ(msgno) <= fetchargs->changedsince) return 0;
+    if (fetchargs->changedsince && MODSEQ(msgno) <= fetchargs->changedsince) return 0;
 
     /* Open the message file if we're going to need it */
     if ((fetchitems & (FETCH_HEADER|FETCH_TEXT|FETCH_RFC822)) ||


More information about the Info-cyrus mailing list