DBERROR: skiplist recovery errors

Lawrence Greenfield leg+ at andrew.cmu.edu
Thu Dec 12 18:16:03 EST 2002


   From: "Rob Mueller" <robm at fastmail.fm>
   Date: Thu, 12 Dec 2002 13:01:48 +1100

   Looks like you've got corrupted skiplist files. Delete the seen state
   databases with the problem and it will automatically rebuild them.

That's what we do.

   It's scarier when you see this on the mailboxes DB.

You've had this problem on your mailboxes db? Yuck.

   Which reminds me. The skiplist recovery code barfs if it comes across this
   error and gives up. However, if you truncate the length of the file to just
   smaller than the problem location, then it happily recovers, even though
   there's an incomplete record. I really don't think recovery should ever just
   'give up'. Maybe make a backup of the bad DB, remove the offending records,
   warn the user and continue would be much nicer.

That's a good idea; could you file it as a bug?

Doing this could destroy most of the database and could be even more
confusing to system administrators. I guess reasonable syslog()
records would help with that.

   Also there's a bit of code that looks like this:
	   for (;;) {
	       p += RECSIZE(p);
	       if (p >= q) break;
	       if (TYPE(p) == COMMIT) break;
	   }

   I had a corrupted DB once where RESIZE(p) == 0. This just went into an
   infinite loop. Also not a good idea when trying to recover a database.

Probably verifying that RECSIZE > 0 would be good. Though it looks
like RECSIZE == 0 implies that the TYPE of the record isn't valid,
either.

Hmm.

Larry





More information about the Info-cyrus mailing list