cyrus replication validation
Rob Mueller
robm at fastmail.fm
Mon Jul 16 19:36:18 EDT 2007
Hi
> If I understand this patch correctly, it doesn't solve the larger problem
> that I'm interested in: is the data on my replica the same as the data on
> my primary, or more to the point, are the two data sets converging? ...
> But I'm really interested in something that can run out of band from
> csync, imap, etc, that examines files on the primary and replica to know
> what the variance
As mentioned, there's two parts to the patch. The UUID part which helps with
the replication, but there's also this bit.
>> 2. You can fetch a computed MD5 of any message on disk via IMAP
>>
>> Using the second, you can do complete validation via IMAP, just iterate
>> through all folders and all messages, get the computed MD5 and compare
>> on both sides.
We wanted the same thing you did, some way to guarantee that the message
data on both sides was exactly the same. One way of doing that was to use
something that runs under the covers to check the messages on disk, which is
fine. The other was to basically add something to the IMAP protocol which
lets us do the same thing via IMAP.
We went the second, because we already had code that given a username, would
check their master server and replica server to see that
1. The folder list matched
2. For each folder, message count + unread count + uidvalidity + uidnext
matched (eg status results)
3. For each folder, the UID listing matched
4. For each folder, the flags on each UID message matched
These were all easy to get via IMAP on both sides and compare. However they
were all meta-data related, and didn't help check that the actual email
spool data on disk was correct. Which is why we added two FETCH items to the
imap protocol with the above patch.
FILE.MD5 and FILE.SIZE
With these, we can now compare each file on each side of the master/replica
set to see that they match. This means we can now check pretty much all meta
data + spool data on both sides for consistency, all via IMAP connections,
without having having to do any more peeking under the hood. Of course
actually having the patch in there is pretty heavily "peeking under the
hood", but it was easier for us to do that because we already had a script
which did steps 1-4, so adding a hack to the IMAP protocol was easier for us
than creating a whole new system. Whether this is easier/harder at your site
is up to you.
Rob
More information about the Info-cyrus
mailing list