DBERROR: skiplist recovery errors

John A. Tamplin jtampli at sph.emory.edu
Thu Dec 12 20:00:52 EST 2002


Quoting Lawrence Greenfield <leg+ at andrew.cmu.edu>:

> [Sidebar: skiplist makes an assumption that a single byte write might
> be in an interdeterminate state after a crash, that write will not
> affect nearby data that is already known to be on stable storage. On
> reflection I suspect the loop-forever problem could be caused by a
> "normal" failure, and in such a case the "truncate file" option is
> actually the right thing to do.]

Disks don't write one byte at a time, so a system crash during a write can
result in indeterminate state for the entire block (and it gets worse when you
go through the filesystem rather than raw access to the disk, since data
important to your file could possibly share a physical disk block and be updated
without your knowlege or control not to mention the out-of-order writes
problem).  I haven't looked into the skiplist implementation, but fixing that
problem isn't easy without a pre-image log and some sort of timestamp/sequence
number at both ends of the page.  Once you head down that road, you get very
close to building a full database system and then we are back to the SQL backend
discussed earlier.

-- 
John A. Tamplin
Unix System Administrator




More information about the Info-cyrus mailing list