[Cyrus-CVS] src/cyrus/lib by murch

Fri Nov 16 07:15:58 EST 2007

Update of /afs/andrew/system/cvs/src/cyrus/lib
In directory unix33.andrew.cmu.edu:/var/tmp/cvs-serv8934

Modified Files:
	cyrusdb_skiplist.c 
Log Message:
SKIPLIST bugfixes (Bron Gondwana <brong at fastmail.fm>)

In the past we have had issues with bugs in skiplist on seen
files, and we truncated files at the offset with the issue
since they were only seen data.

Lately, we have had more tools updating mailboxes.db more
often, and have lost multiple mailboxes.db files.

There are two detectable issues:

1) incorrect header "logstart" values causing recovery to
   fail with either unexpected ADD/DELETE records or
   unexpected INORDER records depending which side of the
   correct location the logstart value is wrong.
2) a bunch of zero bytes between transactions in the log
   section.

The attached patch fixes the following issues:

a) recovery failed to update db->map_base if it truncated
   a partial transaction.  This reliably recreated the
   zero bytes issues above by having the next store command
   lseek to a location past the new end of the file, and
   hence fill the remainder with blanks.

b) the logic in the "delete" handler for detecting "no
   record exists" (ptr == db->map_base) was backwards,
   meaning that a delete on a record which didn't exist
   caused reads of PTR(db->map_base, i), which is bogus
   and nasty.  This is the suspect for logstart breakage
   though I haven't proven this yet.

c) unsure if this is a real risk, but added a ftruncate
   to checkpoint to ensure new file really is empty,
   since we don't open it with O_EXCL.

d.1) when abort is called it needs to update_lock() to
     ensure that the records it's about to rollback are
     actually locked.  This fix stopped segfaults in
     my testing.

d.2) delete didn't check retry_write for success, and
     also suffered from the same problem as:

d.3) if retry_writev in store failed, then it called
     abort, but without the records actually written,
     abort had no way of knowing which offsets needed
     to be switched back, meaning bogus pointers could
     be left in the file until the next recovery().
     Changed it to write the records first, then only
     once that succeeded update the pointers.  This way
     abort will do the right thing regardless.

--- links to diffs follow ---
http://bugzilla.andrew.cmu.edu/cgi-bin/cvsweb.cgi/src/cyrus/lib/cyrusdb_skiplist.c.diff?r1=1.53&r2=1.54