Replication and failover

Wesley Craig wes at umich.edu
Thu Jan 18 13:24:12 EST 2007


On 18 Jan 2007, at 05:41, Janne Peltonen wrote:
> Is there documentation abt replication failover scenarios anywhere? I
> can, of course, conjure up a thing or two, but I'd like to see how  
> other
> people have resolved 'corrupted mailspool -> services to the  
> replica ->
> maintenance -> resync master -> services back to the master'  
> situations.
> I did a short Google, but didn't find much of notice.

Attached is our operation group's notes on the subject.  It makes  
reference to the tool we use to manage the OS of the machines  
(radmind), but it should be pretty clear what they are talking about  
without any radmind knowledge.

:wes
-------------- next part --------------

1. Establish primary failure
    we believe that the failover procedure should take approximately 30 minutes, so the failover procedure should be invoked whenever the estimated downtime on the primary would exceed this amount of time
    an exception may be made if there is reason to believe that a substantial amount of data on the failed primary was not synched to the replica; we will discuss the feasibility of sanity checks which can be run prior to failover

2. Stop cyrus/sync_client on primary if necessary / remove primary from network if necessary
    /etc/init.d/cyrus stop
    /etc/init.d/sync_client stop
    /etc/init.d/network stop (or unplug network cable)
    
3. stop cyrus on the replica
    /etc/init.d/cyrus stop
    
4. Change dns so that the name of <badhost>-repl becomes <badhost>
    -> ensure you change forward and reverse
    -> leave original entries commented out

5. Verify dns changes are working by checking on truelies
    dig <badhost>.mail
    dnsrev the ip    
    
6. Put special files of <badhost>-repl in place for <badhost> to reflect ip information of replica
    cd to special dir (generally /var/radmind/special/imap)
    cp -R <badhost> <badhost>.save
    cp <badhost>-repl/etc/sysconfig/network <badhost>/etc/sysconfig/network
    cp <badhost>-repl/etc/sysconfig/network-devices/ifconfig.eth0 <badhost>/etc/sysconfig/network-devices
    edit network to fix hostname
    vi <badhost>/etc/sysconfig/network
      
7. radmind the replica
    ra.sh update

    Update command file and/or transcripts? [Yn] y
/var/radmind/client/command.K: updated
/var/radmind/client/special.T: updated
c ./dev/ttyS0                           0600     0     0     4    64
special.T:
+ f ./etc/adsm/TSM.PWD                          0444     0     0 1093046900     164 TIgISWWzEESwLKsM5TQx4CRH1hc=
imap/imap-23backend.T:
+ f ./etc/cyrus.conf                            0644     0     0 1156541554    1380 HqMdPv649xvUptagZY1X489CCpo=
imap/imap.T:
+ f ./etc/imapd.common.conf                     0644     0     0 1119845235     871 kTjkwR4x0SwRuK3qvpKi2ZGwANU=
imap/imap-23backend.T:
+ f ./etc/imapd.conf                            0644     0     0 1155789187     343 RIr24APHrHa8fp6YTCezsGUCK4U=
special.T:
+ f ./etc/imapd.host.conf                       0444     0     0 1156186085     104 RIgobQuTFI/HRQNmF4H4WEEoU1I=
+ f ./etc/krb5.keytab                           0640     0    25 1093051728     952 hk7wwXNZgVqyiPgB8BQ55fGtULg=
+ f ./etc/sysconfig/network                     0644     0     0 1166473054      81 pfuFsI4FuD763RKzCIXMHojQadc=
+ f ./etc/sysconfig/network-devices/ifconfig.eth0       0644     0     0 1166473074      78 yXkW7BokmxryTqqJKLmFl9zc3Qs=
+ f ./etc/sysconfig/network-devices/ifconfig.eth1       0444     0     0 1166473075      71 yvCcuy3ATic/4AXPPVa1zeoPnbo=
- f ./opt/tivoli/tsm/client/ba/bin/dsm.sys      0644     0     0 1164130511     418 -
+ f ./var/imap/hostname.pem                     0444     0     0 1155787168    2920 Hyfrb/Sg4WkWHp/dUYHe8q9/cv4=


8.  /etc/init.d/network restart
    hostname <badhost> (remember to use fqdn)
    pkill syslogd ksyslogd
    
    or reboot (your choice)

9. start cyrus
    su cyrus
    (get tickets)
    /usr/local/heimdal-k5/bin/kinit -k -l 25h imap/mail.umich.edu at UMICH.EDU
    ctl_mboxlist -m -w
    (no output is good!!!)
    <if ok>
    (exit so you are root)
    init 3

10. comment out replnag  until new replica is brought up

11. restart nefu to catch ip change

*** bringing up a new replica, hopefully on same hardware **

1. update DNS for new replica

2. set up special files of <badhost>-repl
    cd to special dir (generally /var/radmind/special/imap)
    cp -R <badhost>-repl <badhost>-repl.save
    cp <badhost>.save/etc/sysconfig/network <badhost>-repl/etc/sysconfig/network
    cp <badhost>.save/etc/sysconfig/network-devices/ifconfig.eth0 <badhost>-repl/etc/sysconfig/network-devices
    edit network to fix hostname
    vi <badhost>-repl/etc/sysconfig/network

3. reload new replica with existing command file

4. boot new replica & start cyrus

5. generate list of mailboxes & sync
    to get mailboxes
    ctl_mboxlist -d > /tmp/users
    awk '{ print $1 }' /tmp/users | xargs sync_client -v -l -m 

6. start sync client


*** switch back during next maintenance window ***

1. stop cyrus on primary
    init 2

2. verify that /var/imap/sync is empty (no pending syncs), if not run 
    sync_client -v -l -r -f <file>
    on any remaining log files, delete each file after syncing

3. swap DNS

4. move specials back into place
    cd to special dir
    mv <badhost> <badhost>.old
    mv <badhost>-repl <badhost>-repl.old
    mv <badhost>.save <badhost>
    mv <badhost>-repl.save <badhost>-repl

5. reset both machines
    /etc/init.d/network restart
    hostname <badhost> (remember to use fqdn)
    pkill syslogd ksyslogd
    
    or reboot (your choice)

6. start cyrus
    su cyrus
    (get tickets)
    /usr/local/heimdal-k5/bin/kinit -k -l 25h imap/mail.umich.edu at UMICH.EDU
    ctl_mboxlist -m -w
    (no output is good!!!)
    <if ok>
    (exit so you are root)
    init 3

7. cleanup specials (remove the .old directories)


More information about the Info-cyrus mailing list