[RFC PATCH] Fix index lock breakage in mailbox_release_resources()

Bron Gondwana brong at fastmail.fm
Thu Aug 6 17:55:00 EDT 2015


On Fri, Aug 7, 2015, at 00:31, Thomas Jarosch wrote:
> Hi Bron,
> 
> On Sunday, 19. July 2015 10:05:12 you wrote:
> > > @@ -4370,7 +4371,7 @@ static int mailbox_reconstruct_create(const char
> > > *name, struct mailbox **mbptr)> 
> > >      /* Attempt to open index */
> > >      r = mailbox_open_index(mailbox);
> > > 
> > > -    if (!r) r = mailbox_read_index_header(mailbox);
> > > +    if (!r) r = mailbox_lock_index(mailbox, LOCK_EXCLUSIVE);
> > > 
> > >      if (r) {
> > >  	
> > >  	printf("%s: failed to read index header\n", mailbox->name);
> > >  	syslog(LOG_ERR, "failed to read index header for %s", mailbox->name);
> > 
> > I'm not convinced by this one - the whole point with reconstruct is that
> > you're working with a potentially broken mailbox, and you want to check
> > each step more simply to see if it works.  I've left this one in place.
> > 
> > (we do need some more tests of reconstruct on mailboxes with errors in
> > them to make sure it corrects them - anyone feel like doing some
> > Cassandane work? :) )
> 
> we should check again on this topic. Right now the code path
> in reconstruct is still broken.
> 
> I think I tested the "reconstruct" code path with something like this:
> 
> diff --git a/imap/mailbox.c b/imap/mailbox.c
> index 3828053..6a9c5da 100644
> --- a/imap/mailbox.c
> +++ b/imap/mailbox.c
> @@ -6154,7 +6154,9 @@ EXPORTED int mailbox_reconstruct(const char *name, int flags)
>          syslog(LOG_NOTICE, "reconstructing %s", name);
>      }
>  
> -    r = mailbox_open_iwl(name, &mailbox);
> +    /* DEBUG hack only */
> +    // r = mailbox_open_iwl(name, &mailbox);
> +    r = 1;
>      if (r) {
>          if (!make_changes) return r;
>          /* returns a locktype == LOCK_EXCLUSIVE mailbox */

So the other codepath is broken too - it sends a single sync_log_mailbox too early
before the sync_log_mailbox_double call - meaning that the destination mailbox gets
synced to the replica again rather than triggering a rename.

This is really bad, because there are then TWO mailboxes with the same uniqueid on
the replica, and the sync code doesn't deal with that sanely - it either tries the rename
again (which is bogus, and aborts with a "mailbox already exists" error, which becomes
"Unknown error" on the wire, also bogus and what got me paged at 3:30am last night.

Or, it doesn't notice the issue at all, because it sees the matching name folder first, in
which case there are now two folders on the replica with same uniqueid, and later if
the same mailbox name as the original source gets created, you can have a really
messy situation where it's trying to replicate to the wrong folder.  Nasty breakage
either way.

Obviously I _must_ fix sync_client to notice that there are two mailboxes with the
same uniqueid on the server and either bail or resolve the issue somehow.

But we also need to fix the locking handling so that we don't log the source name
without the destination name.  One option is to just log BOTH names really early,
but it does mean that replication backs up behind the rename, because it will
latch onto the locked mailbox and wait.

Bron.

-- 
  Bron Gondwana
  brong at fastmail.fm


More information about the Cyrus-devel mailing list