CYRUS_SKIPLIST_UNSAFE

Thu Oct 24 01:13:07 EDT 2002

> But "checkpoints" don't happen every N seconds---they happen when the
> skiplist file has reached a certain size (due to so much write volume)
> and merely serve to keep the size of the file down (skiplist files can
> grow to twice the size they "should" be).
>
> Recovery isn't guaranteed to succeed (or necessarily be sane) when the
> fsync()s are off. The fsync()s in commit() force an ordering---they
> make sure the data is on disk before a 4-byte COMMIT record is
> written.
>
> I believe the 2nd fsync() could be omitted and retain ACI properties
> (just losing durability---definitely ok for seen state). I'd have to
> think more carefully before guaranteeing it.
>
> Losing the first fsync() would compromise consistency and integrity
> since, on a crash, the 4-byte COMMIT record might be written before
> the data, causing recovery to include the now-bogus data in the live
> file.

Hmmm, I'm wondering if it would be possible to include a 'checksum' value
with each record. The idea being that you could spot any bogus records when
doing a recovery and give up immediately at that point? At least you'd be
able to roll back to the previous checkpoint in that case, and if you added
the ability to checkpoint at various time intervals...

The main reason I'm mentioning all this is that from our experience, and
what I gather from others, I/O is the biggest bottleneck in cyrus by a long
way, and I'm guessing that all those fsync() and fdatasync() calls don't
help with the OS's caching ability...

Rob