Large Mailbox Append Fix

Bron Gondwana brong at fastmail.fm
Mon Sep 24 22:36:56 EDT 2007


On Mon, Sep 24, 2007 at 09:43:52PM -0400, Ken Murchison wrote:
> Bron Gondwana wrote:
>> On Mon, 24 Sep 2007 18:45:28 -0400, "Ken Murchison" <murch at andrew.cmu.edu> 
>> said:
>>> Bron Gondwana wrote:
>>>> On Mon, 24 Sep 2007 13:53:28 -0400, "Ken Murchison" 
>>>> <murch at andrew.cmu.edu> said:
>>>>> Bron Gondwana wrote:
>>>>>> On Sat, 22 Sep 2007 10:44:55 +0100 (BST), "David Carter" 
>>>>>> <dpc22 at cam.ac.uk> said:
>>>>>>> On Sat, 22 Sep 2007, Bron Gondwana wrote:
>>>>>>>
>>>>>>>> We still need to sync the final last_uid, which may be greater than
>>>>>>>> the UID of the maximum message.
>>>>>>> Oops, yes. Replacing last_uid for the last batch should be good 
>>>>>>> enough.
>>>>>>>
>>>>>>>> (though I think there's a separate check later that does that if 
>>>>>>>> they
>>>>>>>> still don't match... I really should test-case this!)
>>>>>>> I don't think so. There is a separate test for UIDLAST if 
>>>>>>> do_mailbox_work() finds that it has no messages to UPLOAD.
>>>>>> Yeah, you're right.  I've got a simpler rewrite of the patch here that
>>>>>> does update_uidlast(mailbox) at the end if there's nothing in the
>>>>>> index_list.
>>>>> Attached is my rewrite against CVS.  I removed duplicate code that 
>>>>> resulted from having one for() loop with a special case if() inside (It 
>>>>> not uses a for() loop nested inside a do() loop).  The patch doesn't 
>>>>> look simpler because my version of diff appears to be braindead.
>>>> This still doesn't guarantee to update UIDLAST in the pathological case
>>>> where there are exactly max_count messages to be uploaded, not including
>>>> the final message in the list due to a UUID/GUID matchup - at least not
>>>> according to my reading of it.
>>> Hmm.  I could be wrong, but UIDLAST always gets set in the last batch, 
>>> even if the number of messages is a multiple of max_count.  It may be 
>>> hard to tell from the diff, since its kind of chopped up.
>> Wow, and it's a pain to apply (don't suppose you could generate patches
>> with -p1 easily could you?  Quilt likes them that way).  Even applying by
>> hand though, it's breaking all over the place!  I'll try and hand import
>> it and see what it looks like.
>> I still read it as possible failing.
>> Can you please consider the following cases:
>> Replica last_uid = 2000
>> Master last_uid = 3005 (say, with the top couple of messages either 
>> deleted
>> or just happen to already exist on the other end for some reason like they
>> were copied in from another folder)
>> max_count = 1000
>> a) There are 999 messages which don't exist on the replica, the uid of the
>>    highest is 3002
>> b) There are 1000 messages which don't exist on the replica, the uid of 
>> the
>>    highest is 3003
>> c) There are 1001 messages which don't exist on the replica, the uid of 
>> the
>>    highest is 3004
>> In each case, what commands are sent?  I see the following:
>> a) UPLOAD 3005 <date> [999 messages]\r\n
>> b) UPLOAD 3003 <date> [1000 messages]\r\n
>
> I'm pretty sure that base c) will come out:
>
> UPLOAD 3005 <date> [1000 messages]\r\n
>
> Any time we exit the for() loop and msgno > exists, we set 
> index_list->last_uid to mailbox->last_uid, even if index_list->count > 
> max_count.

surely this generates:

c) UPLOAD 3005 <date> [1001 messages]\r\n 

???

NOTE - we really don't want the following case:

UPLOAD 3005 <date> [1000 messages]\r\n
UPLOAD 3005 <date> [1 message with uid 3003]\r\n

Because this causes a sync_combine_commit on the replica end,
with all the horrid index and cache file rewrite IO hit that
we're trying to avoid in the first place.

Bron.


More information about the Cyrus-devel mailing list