Replication specifics

Patrick Radtke phr2101 at columbia.edu
Tue May 23 17:16:14 EDT 2006


On May 23, 2006, at 4:48 PM, David Korpiewski wrote:

> So I got into a big argument with the people in my department about  
> how replication works and I'm seeking some guidance from the  
> community:
>
> (1)The worst fear of any prof here at UMASS is the potential of  
> losing a single email.   So my question is this:  If we set up  
> replication, and we have to failover to the replica, is there any  
> way to get back email that may not have been replicated -- ones  
> that currently only exists on the defunct master?   If the replica  
> updates every 10 seconds, then we have the potential to lose 10  
> seconds of email.   Or worse case, the sync_client dies and we lose  
> 30 minutes or more of emails before we failover!
>

Once we have the primary/master backend machine working again after a  
failover (assuming its RAID is still intact) we do a find for any  
messages that have timestamps just prior to the the machine failing.
We then compare this list to the messages on the replica.  Since we  
have delayed expunge on, we can still determine if a specific message  
was replicated even if the user deleted it.

We also monitor the sync_client process and someone gets alerted if  
it goes away.

Of course some messages can be lost. But the same is true for any of  
your smtp machines. If one suffers a catastrophic failure then any  
messages queued on the machine would be lost.

> Do other folks out there plan for this "potential for lost emails"  
> or do you just failover and if a few messages get lost, you don't  
> worry about it?
>
> (2)Also, is there a master sync transaction log file somewhere that  
> specifies what is being done?  In other words, if we failed over,  
> could we find a transaction log that would tell us what was not  
> committed and then manually run through it to make the updates?  I  
> found the log files in "/var/lib/imap/sync", but these are very  
> uninformative:
> for example:
> SEEN davidk user.davidk
> SEEN davidk user.davidk
> SEEN davidk user.davidk
>
> it would be nice to see "SEEN update message READ 12020 for  
> user.davidk.INBOX", but I don't know if this detailed information  
> is somewhere on the system or just resides in memory.

We look there as well (and back it up prior ). Then we just look in  
the users' folders for the timestamps on messages.

>
> (3) My final question is this:  If we do a manual sync_client  
> update, is the update a full copy or is it a "differential" copy?    
> So I want to know if we run a manual sync_client if it is going to  
> overwrite the entire replica's mailstore or just search and find  
> what is different and just update those portions.

I believe it does a diff (I haven't looked at the code)

-Patrick


More information about the Info-cyrus mailing list