[BUG] Replication synchronisation fails when message size on disk doesn't match index data

Bron Gondwana brong at fastmail.fm
Sat Nov 25 19:04:40 EST 2006


Hi Ken,

Attached is a patch I developed after our regular "CheckReplication"
task discovered mismatched message sizes in some mailboxes.  On
examining the files on disk I discovered that they were identical up
to the length of the shorter message, however the longer message was
sometimes the rest of the email, and sometimes contained a bunch of
extra "junk" that looked remarkably like the index file entries for
other messages in the folder.

I tracked it down to the following code:

sync_client.c:1174:        r = mailbox_map_message(mailbox, record->uid, &msg_base, &msg_size);
..
sync_client.c:1190:        prot_printf(toserver, "{%lu+}\r\n", record->size);
sync_client.c:1191:        prot_write(toserver, (char *)msg_base, record->size);

As you can see, it assumes msg_size and record->size are identical without
checking.  If there is corruption on the data partition and the something
has gone wrong with the message file size, then this can cause less than
record->size bytes to be written.

The attached patch sends an IOERROR: syslog message and returns an error code
rather than sending any data for the associated message to the sync_server,
hence alerting the admin to the problem and allowing it to be resolved.

An alternative would be replicating the bogus message file by using msg_size
rather than record->size in the two final lines above.  This has the advantage
of not breaking replication for later messages and causing even weirder
corrupted files on the destination, but on the downside it doesn't inform the
sysadmin.

I guess doing that along with a syslog message is another sane approach, since
you'd still know of the issue but replication would continue.  In our case we're
happy to have replication fail since we have monitoring scripts that will scream
at us when that happens and we'll get in and fix things pronto.

Regards,

Bron.
-- 
  Bron Gondwana
  brong at fastmail.fm

-------------- next part --------------
diff -ur --new-file cyrus-imapd-cvs.orig/imap/sync_client.c cyrus-imapd-cvs/imap/sync_client.c
--- cyrus-imapd-cvs.orig/imap/sync_client.c	2006-07-26 20:03:15.000000000 -0400
+++ cyrus-imapd-cvs/imap/sync_client.c	2006-11-25 01:45:24.000000000 -0500
@@ -1178,6 +1178,12 @@
                    record->uid, mailbox->name);
             return(IMAP_IOERROR);
         }
+        if (msg_size != record->size) {
+            syslog(LOG_ERR,
+                   "IOERROR: message size mismatch for %lu of %s (%d <> %d): %m",
+                   record->uid, mailbox->name, msg_size, record->size);
+            return(IMAP_IOERROR);
+        }
 
         prot_printf(toserver, " %lu %lu %lu {%lu+}\r\n",
 		    record->header_size, record->content_lines,


More information about the Info-cyrus mailing list