Index Upgrade Bug

Thu Aug 27 01:08:47 EDT 2009

Doh!

>From the end of "mailbox_read_index_header".

    if (!mailbox_doing_reconstruct &&
        (mailbox->minor_version < MAILBOX_MINOR_VERSION)) {
        return IMAP_MAILBOX_BADFORMAT;
    }

Now - this is unreachable because earlier in the same scope we have:

    if ((mailbox->start_offset < OFFSET_HEADER_SIZE) ||
        (mailbox->record_size < INDEX_RECORD_SIZE) ||
        (mailbox->minor_version < MAILBOX_MINOR_VERSION)) {
        if (mailbox_upgrade_index(mailbox))
            return IMAP_IOERROR;

        syslog(LOG_INFO, "Index upgrade: %s (%d -> %d)", mailbox->name,
                         mailbox->minor_version, MAILBOX_MINOR_VERSION);

        /* things might have been changed out from under us. reread */
        return mailbox_open_index(mailbox);
    }

(the syslog line is added by one of my patches -it's not in upstream)

It's also bogus, because it doesn't stop us operating on future version
mailboxes.  I believe the logic is reversed, and it should be:

    if (!mailbox_doing_reconstruct &&
        (mailbox->minor_version > MAILBOX_MINOR_VERSION)) {
        return IMAP_MAILBOX_BADFORMAT;
    }

or even:

    if (!mailbox_doing_reconstruct &&
        (mailbox->minor_version != MAILBOX_MINOR_VERSION)) {
        return IMAP_MAILBOX_BADFORMAT;
    }

Ken - comments?  Does this make sense to you?  I can see this would have
broken over the past few index changes - resized uuid => guid, modseq
being added, etc.  In general you can't just can't accurately update a
mailbox of a future version and ensure all the fields are in sync.

Symptom here (we hit this upgrading all our servers) was that there were
checksum errors galore.  Older imapds that were running across the upgrade
rewrote parts of the index record without updating the checksums.

THANKFULLY (more to the point, good design) there are no crash bugs or
unaligned writes.  The code is careful about using the index_header
field for record size when seeking to the correct offset, but the
INDEX_RECORD_SIZE constant for reading and writing buffers.

Still - this is going to be an issue for sites upgrading to the new CRC32
protected indexes.  It is necessary to stop cyrus before upgrading and then
start it again afterwards so you never had processes with two different
index minor versions coded into them running at the same time.

Bron.