cyrus replication validation

Rob Mueller robm at fastmail.fm
Mon Jul 16 19:36:18 EDT 2007


Hi

> If I understand this patch correctly, it doesn't solve the larger  problem 
> that I'm interested in: is the data on my replica the same as  the data on 
> my primary, or more to the point, are the two data sets  converging?  ... 
> But I'm really interested  in something that can run out of band from 
> csync, imap, etc, that  examines files on the primary and replica to know 
> what the variance

As mentioned, there's two parts to the patch. The UUID part which helps with 
the replication, but there's also this bit.

>> 2. You can fetch a computed MD5 of any message on disk via IMAP
>>
>> Using the second, you can do complete validation via IMAP, just  iterate 
>> through all folders and all messages, get the computed MD5  and compare 
>> on both sides.

We wanted the same thing you did, some way to guarantee that the message 
data on both sides was exactly the same. One way of doing that was to use 
something that runs under the covers to check the messages on disk, which is 
fine. The other was to basically add something to the IMAP protocol which 
lets us do the same thing via IMAP.

We went the second, because we already had code that given a username, would 
check their master server and replica server to see that
1. The folder list matched
2. For each folder, message count + unread count + uidvalidity + uidnext 
matched (eg status results)
3. For each folder, the UID listing matched
4. For each folder, the flags on each UID message matched

These were all easy to get via IMAP on both sides and compare. However they 
were all meta-data related, and didn't help check that the actual email 
spool data on disk was correct. Which is why we added two FETCH items to the 
imap protocol with the above patch.

FILE.MD5 and FILE.SIZE

With these, we can now compare each file on each side of the master/replica 
set to see that they match. This means we can now check pretty much all meta 
data + spool data on both sides for consistency, all via IMAP connections, 
without having having to do any more peeking under the hood. Of course 
actually having the patch in there is pretty heavily "peeking under the 
hood", but it was easier for us to do that because we already had a script 
which did steps 1-4, so adding a hack to the IMAP protocol was easier for us 
than creating a whole new system. Whether this is easier/harder at your site 
is up to you.

Rob



More information about the Info-cyrus mailing list