FastMail Cyrus Patches for upstream
brong at fastmail.fm
Tue Aug 25 04:42:23 EDT 2009
UP-FRONT notice. I'd particulary love feedback on the index
format change. Here's the executive overview:
index header: replace SPARE4 with HEADER_CRC - a CRC32 of
the rest of the header.
index record: add two additional 32bit values, CACHE_CRC
and RECORD_CRC. CACHE_CRC is a crc32 of the entire cache
record, and RECORD_CRC is a crc32 of the entire index
record (including CACHE_CRC) - providing integrity checking
all the way through.
Total additional cost: 8 bytes per message plus some CPU time
creating and checking the CRCs. Benefit - immediate index
corruption detection. I think this is a good thing - in
theory the underlying layers should be providing perfect
abstractions, but in practice a memory error, disk error or
even eratic cable can cause transient failures - and if we
write those incorrect values back to the file they last
Ok - onto the main show!
I've got a small pile of patches for upstream... some quite
old and heavily tested, and a couple because I want to grab
dibs on index minor_version 11 before someone else claims it
and makes a total mess of our patch management!
OK - here we go ( sorry about the long URLS - you can just go
to http://cyrus.brong.fastmail.fm/ and follow the links, or of
course hit github at http://github.com/brong/cyrus-imapd/ )
Use a struct of individual cache items rather than macros,
allows sanity checks on the cache record to detect corruption
and avoid crashing!
NOTE: I'd love to do this with index records as well, but it's
an awful lot of work. I'll be doing that slowly as time permits.
NOTE - this is a huge patchset, and it's had an enormous amount of
work done on it! This completely changes the charset encoding
pathways within Cyrus. It gives unicode 5.1 support, a bunch of
new charactersets, and full utf-8 support in sieve scripts. It
also allows search with whitespace to work by compressing
whitespace to a single space rather than removing it entirely.
This is the new one! It's not entirely complete in its behaviour yet,
it only syslogs for issues, and it's not syslogging on all paths that
read the index and cache records yet. I'll work on adding those over
time. I _believe_ it's creating crc32s on all paths that modify the
record, which is the important thing!
I've chosen to only implement crc32 where zlib is available,
putting stubs that return 0 otherwise. It would be easy
enough to copy in the public domain crc32 code that's out there
if we want to support everyone.
I've also chosen to ignore a cache_crc of zero, so that we can upgrade
indexes without a huge IO hit as we read the entire cache to find the
initial values. It means one in 2^32 records won't have integrity
protection. I can live with that.
Comments please. Once I've dumped this stuff in CVS I'd love to cut
another release with all the cool new features so other people can use
them. The charset support in particular is a nice user-visible thing
that fixes a bunch of bugzilla bugs and makes fixing others a lot
More information about the Cyrus-devel