sync_client errors out after 2.3.16 -> 2.5.9 upgrade

ellie timoney ellie at fastmail.com
Mon Aug 1 19:55:56 EDT 2016


Thanks for passing these reports on!

Initial impression, I don't think this one is as straightforward as the
last one was, unfortunately. :(

Here's the chunk of code that produces those "guid mismatch" SYNCERRORs:
 https://git.io/v6JWK
(wait a moment for it to load and it will jump to the lines I've
highlighted)

We can see from the order in which those mismatching guids are reported
that, for these messages, the "master" server (i.e. the server on which
sync_client is running) has a guid with a value, whereas the "replica"
server (i.e the other end) has a guid with a null value.  And then the
assertion fails in sync_client -- i.e. the "master" server.  So this is
the split-brain codepath detecting the guid mismatch, understanding that
as "these are two distinct messages that have accidentally wound up with
the same uid at each end", and so rewriting both messages with whole new
uids -- which then fails, because one of the guids is zero. (Note that
the side that *has guids* is trying to write a message without a guid.)

One possible solution that comes to mind is to not treat it as a
split-brain situation if one of the mismatching guids is null -- but I
don't understand the ramifications of that.  I have not yet looked at
the master branch to see if it has improvements in this area.

The other aspect: here's the code that reads the guid from the index
record in the first place: https://git.io/v6JlR

If I'm reading that correctly, we expect version 10 mailboxes to have a
guid in them.  I got the impression from the last problem we looked at
that your mailboxes were version 10, but maybe I was wrong and they're
even older?  Can you check the version of the cyrus.index for
user.robot, specifically on the server you were trying to replicate to? 
There was a thread about how to check that on this list the other day.

I'm not sure whether the assert in mailbox_append_index_record is overly
aggressive or not.  We do expect new records, constructed on the current
version, to have a guid in them.  Maybe, rather than asserting for that,
it should try recalculate the guid if the provided one is null?  Or
maybe something further upstream should have done this?

mailbox_append_index_record eventually calls
mailbox_inbox_record_to_buf, which is responsible for producing a
correct output for the index version it's actually writing to.  It's
here that the outward version detection exists, and it looks consistent
with the inward code linked above: the guid is omitted if the mailbox
version is < 10. (But it still expects it to have been set in the record
it was handed -- it just sometimes doesn't use it.)

Bron, your input would be really appreciated here :)

ellie

On Tue, Aug 2, 2016, at 07:56 AM, Kenneth Marshall via Info-cyrus wrote:
> Hi Cyrus Developers,
>
> Thank you for your patch to address the folder move problem between
> un-reconstructed mailboxes after the 2.3.16 -> 2.5.9 upgrade. I am
> not sure, but it looks like there may be another overly aggressive
> check. I keep getting these fatal errors from sync_client:
>
> Aug  1 16:24:16 cyrus1a imap/sync_client[14886]: SYNCERROR: guid mismatch
> user.robot 1696 (56412de8678bfb53f6cdb5fe4502031af5484056
> 0000000000000000000000000000000000000000)
> Aug  1 16:24:16 cyrus1a imap/sync_client[14886]: SYNCERROR: guid mismatch
> user.robot 1697 (1b0024218a4419973b83ae3e84ac7133a4ab7d40
> 0000000000000000000000000000000000000000)
> Aug  1 16:24:16 cyrus1a imap/sync_client[14886]: SYNCERROR: guid mismatch
> user.robot 1698 (f17084425d83bccb28a4dfa195846c7ef88c7567
> 0000000000000000000000000000000000000000)
> Aug  1 16:24:16 cyrus1a imap/sync_client[14886]: SYNCERROR: guid mismatch
> user.robot 1699 (7a751e41e1d3a58e541298ab724be4c29d96e49d
> 0000000000000000000000000000000000000000)
> Aug  1 16:24:16 cyrus1a imap/sync_client[14886]: SYNCERROR: guid mismatch
> user.robot 1700 (724a013d0ae97d27a1da33832487df1719681659
> 0000000000000000000000000000000000000000)
> Aug  1 16:24:16 cyrus1a imap/sync_client[14886]: Fatal error: Internal
> error: assertion failed: imap/mailbox.c: 2850:
> !message_guid_isnull(&record->guid)
>
> And then the sync_client has to be run manually and if lucky, it will
> process the full log successfully. I was looking in imap/mailbox.c
> and it looks like the assert at line 2850 may need a similar override
> for non-upgraded folders:
>
> -----------imap/mailbox.c---------------
> /* append a single message to a mailbox - also updates everything
>  * automatically.  These two functions are the ONLY way to modify
>  * the contents or tracking fields of a message */
> EXPORTED int mailbox_append_index_record(struct mailbox *mailbox,
>                                 struct index_record *record)
> {
>     indexbuffer_t ibuf;
>     unsigned char *buf = ibuf.buf;
>     size_t offset;
>     int r;
>     int n;
>     struct utimbuf settime;
>     uint32_t recno;
>
>     assert(mailbox_index_islocked(mailbox, 1));
>
>     /* Append MUST be a higher UID than any we've yet seen */
>     assert(record->uid > mailbox->i.last_uid)
>
>     /* Append MUST have a message with data */
>     assert(record->size);
>
> =====>    /* GUID must not be null */
> =====>    assert(!message_guid_isnull(&record->guid));
>
>     /* belt AND suspenders - check the previous record too */
>     if (mailbox->i.num_records) {
>         struct index_record prev;
>
> -----------imap/mailbox.c---------------
>
> What do you think?
>
> Regards,
> Ken
> ----
> Cyrus Home Page: http://www.cyrusimap.org/
> List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
> To Unsubscribe:
> https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus


More information about the Info-cyrus mailing list