Cyrus with a NFS storage. random DBERROR
Rob Mueller
robm at fastmail.fm
Sat Jun 9 00:14:45 EDT 2007
> I don't have something to consume make_md5 data, yet, either. My
> plan is to note the difference between the replica and the primary.
> On a subsequent run, if those differences aren't gone, then they
> would be included in a report.
Rather than make_md5, check the MD5 UUIDs patch below. Using this, we have a
script that regularly checks both sides of a master/replica pair to check
everything is consistent between the UUID and the computed MD5. It was this
that let us discover the rare "didn't unlink old files" bug reported about 3
months back.
---
http://cyrus.brong.fastmail.fm/
One problem we've had is the inability to easily check that the files on
disk correspond to what was originally delivered to check for cyrus data
corruption after either a disk problem or some other bug has caused us to be
unsure of our data integrity.
I wanted to calculate a digest and store it somewhere in the index file, but
messing with the file format and fixing sync to still work, etc... it all
sounded too painful.
So - added is a new option "uuidmode" in imapd.conf. Set it to md5 and you
will get UUIDs of the form: 02(first 11 bytes of the MD5 value for the
message) which takes up the same space, but allows pretty good integrity
checking.
Is it safe? - we calulated that with one billion messages you have a one in
1 billion chance of a birthday collision (two random messages with the same
UUID). They then have to get in the same MAILBOXES collection to sync_client
to affect each other anyway. The namespace available for generated UUIDs is
much smaller than this, since they have no collision risk - but if you had
that many delivering you would hit the limits and start getting blank UUIDs
anyway.
Mitigating even the above risk: you could alter sync_client to not use UUID
for copying. It's not like it's been working anyway (see our other UUID
related patch). As an integrity check it's much more useful.
The attached patch adds the md5 method, a "random" method which I've never
tested and is almost certainly bogus, but is there for educational
value[tm], the following FETCH responses in imapd:
FETCH UUID => 24 character hex string (02 + first 11 bytes of MD5) FETCH
RFC822.MD5 => 32 character hex string (16 bytes of MD5) FETCH
RFC822.FILESIZE => size of actual file on disk (via stat or mmap)
Totally non-standard of course, but way useful for our replication checking
scripts. Embrace and extend 'r' us.
Anyone feel like writing an RFC for fetching the digest of a message via
IMAP? If the server calculated it on delivery and cached it then you'd have
a great way to clean up after a UIDVALIDITY change or other destabilising
event without having to fetch every message again.
---
Rob
More information about the Info-cyrus
mailing list