[RFC PATCH] Fix index lock breakage in mailbox_release_resources()
Thomas Jarosch
thomas.jarosch at intra2net.com
Mon Jul 20 03:28:08 EDT 2015
Hi Bron,
On Sunday, 19. July 2015 10:05:12 you wrote:
> On Sun, Jul 19, 2015, at 07:06 AM, Thomas Jarosch wrote:
> > mailbox_release_resources() closes the file descriptor
> > to the index. This implicitly releases any lock on the file.
> >
> > Unfortunately the function does not reset the index_locktype state.
> > Therefore we didn't notice that a mailbox rename
> > operated on an unlocked index of the new mailbox. Yikes.
>
> How did you discover this? I'm surprised we haven't run into this
> already.
currently cyrus-imapd 2.4.18 deadlocks about once a week.
Similar symptoms as described here:
http://lists.andrew.cmu.edu/pipermail/info-cyrus/2013-May/036951.html
So I'm now going through the locking code. Of course I can't reproduce
it on my development machine and I tried really hard.
The only pattern I can see is that it happens one machines
with very large mailboxes. They run "quota -f" nightly,
ipurge to delete trashed mails older than 30 days
and the cyr_getdeleted tool I posted recently.
-> the more cyrus tools we run in parallel, the worse
the problem seems.
> > Note: mailbox_index_lock() implies a mailbox_read_index_header().
> > But it also does more work than just read the index header,
> > so there might be side effects.
>
> I changed to mailbox_index_lock_internal() since we don't want to abort
> on a DELETED mailbox.
ok
> > @@ -4370,7 +4371,7 @@ static int mailbox_reconstruct_create(const char
> > *name, struct mailbox **mbptr)>
> > /* Attempt to open index */
> > r = mailbox_open_index(mailbox);
> >
> > - if (!r) r = mailbox_read_index_header(mailbox);
> > + if (!r) r = mailbox_lock_index(mailbox, LOCK_EXCLUSIVE);
> >
> > if (r) {
> >
> > printf("%s: failed to read index header\n", mailbox->name);
> > syslog(LOG_ERR, "failed to read index header for %s", mailbox-
>name);
>
> I'm not convinced by this one - the whole point with reconstruct is that
> you're working with a potentially broken mailbox, and you want to check
> each step more simply to see if it works. I've left this one in place.
if you back out that change, it won't work anymore:
mailbox_read_index_header() will bail out because the index is not locked.
I've tested that code path by faking the return code of mailbox_open() so it
executes the reconstruct_create() code path.
Right now I've interfaced the locking code with helgrind's lock validation
mechanism and annotated all lock_xx() calls. That works more or less.
Since kernel 3.14.x, there's a userspace version of "lockdep" called
liblockdep. I'll switch to that soon as it supports "tryLock()" style calls
as we do with lock_nonblocking().
The tricky part is that cyrus is a multi-process tool instead of a multi-
threaded one, so the current lock validation tools are more or less useless.
I came up with a workaround: Dump the lock state information in lock_xxx()
to a file, probably JSON. A "daemon" tool will later on read that "lock
transation file" and feed the information in the lock validation tool /
library. Hopefully this works out, I have a feeling we have a classic
ABBA deadlock somewhere in the code.
Cheers,
Thomas
More information about the Cyrus-devel
mailing list