Multiple skiplist bugs found, patches attached
Bron Gondwana
brong at fastmail.fm
Mon Nov 12 09:00:00 EST 2007
On Mon, Nov 12, 2007 at 12:34:34AM +1100, Bron Gondwana wrote:
> Anyway - here it is. A "recovery()" that copes if the logstart
> parameter in the database header is wrong. No, I don't have a
> clue how that happened unless lseek() lied. Maybe it sometimes
> lies, I don't know. I'll be writing a test case for that soon
> too!
I have some more suspicions now, but I wrote it all up in the
patch header, so here's the bugfixes only patch, a "robustness"
extras patch and the tool I used for testing.
Ken, I know you've done some other work on the file changing
types. I'd like to be even more agressive and convert just
about everything to bit32 and also rename some variables, but
I restricted myself in this to only fixing the most ugly case:
offset = htonl(offset).
These patches are all against 2.3.10 (in this order), and may
need some fuzz fixing to apply against your latest CVS thanks
to those changes - sorry I haven't done that, but it's getting
on 1am for me, and I've just finished doing a lot of testing
and paring these down to simple and clear patches that don't
touch more than they need to fix the issues.
cyrus-skiplist-bugfixes-2.3.10.diff:
Ken, please review this patch and consider it for pushing out
in the next release, preferably soon. There really are a lot
of issues I found reviewing the code, and even more so with
the attached tool that can be used to "hammer" a file with
all sorts of interesting requests. There are some really
nasty skiplist corruptions available if even one process
is ever killed or segfaults half way through a transaction
and the first operation to touch said file after this is
a write.
cyrus-skiplist-robustify-2.3.10.diff:
If you have been running skiplist on your systems and suspect
that you may have corrupted databases, you probably want this.
It adds extra robustness and fixupability to recovery(). It's
still not going to be crash proof reading line-noise, but it
detects and fixes all the corruptions I have personally seen.
(I say this having tested it on all the bogus DBs I had, eg:)
DBERROR: skiplist recovery /tmp/mb2/mailboxes.db.1194508562:
-> 8E6DD8 should be ADD or DELETE
became:
skiplist recovery /tmp/mb2/mailboxes.db.1194508562:
-> skipped 136 bytes of zeros at 8E6DD8
skiplist: recovered /tmp/mb2/mailboxes.db.1194508562
-> (44594 records, 9547108 bytes) in 1 second
skiplist: checkpointed /tmp/mb2/mailboxes.db.1194508562
-> (44594 records, 5350556 bytes) in 1 second
and:
skiplist recovery /tmp/mailboxes.db.fail:
-> 288C00 should be ADD or DELETE
became:
skiplist recovery /tmp/mailboxes.db.fail:
-> incorrect logstart 288BD8 changed to 356F94
skiplist: recovered /tmp/mailboxes.db.fail
-> (28811 records, 4109664 bytes) in 1 second
In both cases a second run of the same command (I used
cyr_dbtool 'show') came up clean - no issues remaining
and no log entries.
cyrus-hammer-skiplist-2.3.10.diff:
I used this command to hammer a skiplist database like so:
sudo -u cyrus ./hammer_skiplist -n /tmp/hammer.db &
sudo -u cyrus ./hammer_skiplist -n /tmp/hammer.db &
sudo -u cyrus ./hammer_skiplist -n /tmp/hammer.db &
sudo -u cyrus ./hammer_skiplist -n /tmp/hammer.db &
sudo -u cyrus ./hammer_skiplist -n /tmp/hammer.db &
sudo -u cyrus ./hammer_skiplist -n /tmp/hammer.db &
sudo -u cyrus ./hammer_skiplist -n /tmp/hammer.db &
...
I've turned down the "forget about this transaction" option
to a lot less common than my original tests. It should still
fire a couple of times per hammer, but it creates log entries
even on the fully patched code (rolling back incomplete txn),
so I didn't want to spam the logs.
Enjoy,
Bron.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cyrus-skiplist-bugfixes-2.3.10.diff
Type: text/x-diff
Size: 5980 bytes
Desc: not available
Url : http://lists.andrew.cmu.edu/pipermail/info-cyrus/attachments/20071113/54a4d8df/attachment-0003.bin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cyrus-skiplist-robustify-2.3.10.diff
Type: text/x-diff
Size: 2322 bytes
Desc: not available
Url : http://lists.andrew.cmu.edu/pipermail/info-cyrus/attachments/20071113/54a4d8df/attachment-0004.bin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cyrus-hammer-skiplist.2.3.10.diff
Type: text/x-diff
Size: 7877 bytes
Desc: not available
Url : http://lists.andrew.cmu.edu/pipermail/info-cyrus/attachments/20071113/54a4d8df/attachment-0005.bin
More information about the Info-cyrus
mailing list