Repeat recovers on databases

Bron Gondwana brong at fastmail.fm
Thu Jun 18 19:47:53 EDT 2009


On Thu, Jun 18, 2009 at 05:44:19PM -0400, Michael Bacon wrote:
> Another one stomped here.  This time, it's a 32/64 bit issue.  myinit in  
> cyrusdb_skiplist.c assumes that type_t is 4 bytes long, and writes out 
> that many from the current timestamp when creating $confdir/db/skipstamp. 
>  On 64-bit Solaris, time_t is 8 bytes (it's typedef'ed as a long).  I'm  
> forgetting my Who's Who of big and little endian chips, but my guess is  
> that on x86 systems, the first four bytes are the ones with the real data 
> in them, so there's actually meaningful data that gets written out.  On  
> Sparc, though, no such luck.

Er, yeah.  Ouch.  Damn.

I want to make it an 8 bit value, but that would be an incompatible format
change to skiplists.  At which time I would do a bunch of other stuff too.
I do have a cyrusdb_skiplist2.c file floating around somewhere that does
it (checksums for one thing).

I was even thinking of doing something really evil with ordering on
checkpoint, but I never got around to running the numbers to see if it
made point.

Basically instead of:

level:  1   2   1   3   2   1   1   2
key  : aaa bbb ccc ddd eee fff ggg hhh

It would lay the records out like this:

level:  3   2   2   2   1   1   1   1
key  : ddd bbb eee hhh aaa ccc fff ggg

The advantage being that for a lookup, the "next record" at the same
level would be directly after the current one, so readahead would be
more likely to hit the next node for the search case.  It would be a
fair bit more random for enumerating though, so I don't know if it's
really sane (and of course as you make changes, it all gets more
random until the next checkpoint anyway)

So anyway, will definitely fix the immediate issue!

Thanks,

Bron.


More information about the Cyrus-devel mailing list