skiplist2 file format - and a question

Henrique de Moraes Holschuh hmh at
Sat Oct 1 13:03:32 EDT 2011

On Sat, 01 Oct 2011, Bron Gondwana wrote:
> Do you see a case where the alignment of the "const char *" for
> value is significant?

Yes, it is.  How much depends on the arch, some will just be slowed down in
some specific cases (x86-64), others will actually bus-fault, so the
compiler will have to change an unaligned access into two aligned access
plus magic which translates to even slower access to unaligned data.

I know at least PPC and SPARC64 "dislike" unaligned access.

Anyway, there's an easy way to tell: look at the code generated by gcc for
unaligned access on the various arches.  If it does anything weird to align
the access...

However, anything high-performance (x86-64 included) does memory access on
cacheline-sized units anyway, so you should align data structures inside the
mmap()'d file based not only on field-type alignment (to avoid giving the
compiler even more reason to output crappy code), but also align groups of
fields to cacheline boundaries.

I'd recommend using 64 bytes as the cacheline size to optimize for.

> Still only space for 32 bits worth of records, so we're limited at 4
> billion records per skiplist.  I can live with that.

If you have space left until the next record considering cacheline record
alignment, maybe you could add an extra 32bits of _zero_ padding to that it
is actually ready to be used as a 64 bits field...

> Can anyone see any glaring stupidities in this?

I found none.

Still, can you make a synthetic workload simulator for this?  You'd be able
to easily check the effect of various record and field padding strategies,
as well as use hardware profilling to ask the CPU about cache misses, etc.

> Oh yeah - "dummy" is just a record like everything else, it's the
> first one.  The "level" counter for the database is of course now
> just the "level" field in the dummy record.

Fill some of the dummy record with constant data so that it can be the
magic and file header, maybe?

But _do_ make sure the important part of the file will align well with the
cacheline boundaries, or performance will tank.

> And a commit record is, on disk, 8 bytes:  \0\0\0\0\0\0\0\1 -
> unlikely to occur randomly in data!

What is the scenario you're trying to protect from?

  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh

More information about the Cyrus-devel mailing list