Replication and failover
Wesley Craig
wes at umich.edu
Thu Jan 18 13:24:12 EST 2007
On 18 Jan 2007, at 05:41, Janne Peltonen wrote:
> Is there documentation abt replication failover scenarios anywhere? I
> can, of course, conjure up a thing or two, but I'd like to see how
> other
> people have resolved 'corrupted mailspool -> services to the
> replica ->
> maintenance -> resync master -> services back to the master'
> situations.
> I did a short Google, but didn't find much of notice.
Attached is our operation group's notes on the subject. It makes
reference to the tool we use to manage the OS of the machines
(radmind), but it should be pretty clear what they are talking about
without any radmind knowledge.
:wes
-------------- next part --------------
1. Establish primary failure
we believe that the failover procedure should take approximately 30 minutes, so the failover procedure should be invoked whenever the estimated downtime on the primary would exceed this amount of time
an exception may be made if there is reason to believe that a substantial amount of data on the failed primary was not synched to the replica; we will discuss the feasibility of sanity checks which can be run prior to failover
2. Stop cyrus/sync_client on primary if necessary / remove primary from network if necessary
/etc/init.d/cyrus stop
/etc/init.d/sync_client stop
/etc/init.d/network stop (or unplug network cable)
3. stop cyrus on the replica
/etc/init.d/cyrus stop
4. Change dns so that the name of <badhost>-repl becomes <badhost>
-> ensure you change forward and reverse
-> leave original entries commented out
5. Verify dns changes are working by checking on truelies
dig <badhost>.mail
dnsrev the ip
6. Put special files of <badhost>-repl in place for <badhost> to reflect ip information of replica
cd to special dir (generally /var/radmind/special/imap)
cp -R <badhost> <badhost>.save
cp <badhost>-repl/etc/sysconfig/network <badhost>/etc/sysconfig/network
cp <badhost>-repl/etc/sysconfig/network-devices/ifconfig.eth0 <badhost>/etc/sysconfig/network-devices
edit network to fix hostname
vi <badhost>/etc/sysconfig/network
7. radmind the replica
ra.sh update
Update command file and/or transcripts? [Yn] y
/var/radmind/client/command.K: updated
/var/radmind/client/special.T: updated
c ./dev/ttyS0 0600 0 0 4 64
special.T:
+ f ./etc/adsm/TSM.PWD 0444 0 0 1093046900 164 TIgISWWzEESwLKsM5TQx4CRH1hc=
imap/imap-23backend.T:
+ f ./etc/cyrus.conf 0644 0 0 1156541554 1380 HqMdPv649xvUptagZY1X489CCpo=
imap/imap.T:
+ f ./etc/imapd.common.conf 0644 0 0 1119845235 871 kTjkwR4x0SwRuK3qvpKi2ZGwANU=
imap/imap-23backend.T:
+ f ./etc/imapd.conf 0644 0 0 1155789187 343 RIr24APHrHa8fp6YTCezsGUCK4U=
special.T:
+ f ./etc/imapd.host.conf 0444 0 0 1156186085 104 RIgobQuTFI/HRQNmF4H4WEEoU1I=
+ f ./etc/krb5.keytab 0640 0 25 1093051728 952 hk7wwXNZgVqyiPgB8BQ55fGtULg=
+ f ./etc/sysconfig/network 0644 0 0 1166473054 81 pfuFsI4FuD763RKzCIXMHojQadc=
+ f ./etc/sysconfig/network-devices/ifconfig.eth0 0644 0 0 1166473074 78 yXkW7BokmxryTqqJKLmFl9zc3Qs=
+ f ./etc/sysconfig/network-devices/ifconfig.eth1 0444 0 0 1166473075 71 yvCcuy3ATic/4AXPPVa1zeoPnbo=
- f ./opt/tivoli/tsm/client/ba/bin/dsm.sys 0644 0 0 1164130511 418 -
+ f ./var/imap/hostname.pem 0444 0 0 1155787168 2920 Hyfrb/Sg4WkWHp/dUYHe8q9/cv4=
8. /etc/init.d/network restart
hostname <badhost> (remember to use fqdn)
pkill syslogd ksyslogd
or reboot (your choice)
9. start cyrus
su cyrus
(get tickets)
/usr/local/heimdal-k5/bin/kinit -k -l 25h imap/mail.umich.edu at UMICH.EDU
ctl_mboxlist -m -w
(no output is good!!!)
<if ok>
(exit so you are root)
init 3
10. comment out replnag until new replica is brought up
11. restart nefu to catch ip change
*** bringing up a new replica, hopefully on same hardware **
1. update DNS for new replica
2. set up special files of <badhost>-repl
cd to special dir (generally /var/radmind/special/imap)
cp -R <badhost>-repl <badhost>-repl.save
cp <badhost>.save/etc/sysconfig/network <badhost>-repl/etc/sysconfig/network
cp <badhost>.save/etc/sysconfig/network-devices/ifconfig.eth0 <badhost>-repl/etc/sysconfig/network-devices
edit network to fix hostname
vi <badhost>-repl/etc/sysconfig/network
3. reload new replica with existing command file
4. boot new replica & start cyrus
5. generate list of mailboxes & sync
to get mailboxes
ctl_mboxlist -d > /tmp/users
awk '{ print $1 }' /tmp/users | xargs sync_client -v -l -m
6. start sync client
*** switch back during next maintenance window ***
1. stop cyrus on primary
init 2
2. verify that /var/imap/sync is empty (no pending syncs), if not run
sync_client -v -l -r -f <file>
on any remaining log files, delete each file after syncing
3. swap DNS
4. move specials back into place
cd to special dir
mv <badhost> <badhost>.old
mv <badhost>-repl <badhost>-repl.old
mv <badhost>.save <badhost>
mv <badhost>-repl.save <badhost>-repl
5. reset both machines
/etc/init.d/network restart
hostname <badhost> (remember to use fqdn)
pkill syslogd ksyslogd
or reboot (your choice)
6. start cyrus
su cyrus
(get tickets)
/usr/local/heimdal-k5/bin/kinit -k -l 25h imap/mail.umich.edu at UMICH.EDU
ctl_mboxlist -m -w
(no output is good!!!)
<if ok>
(exit so you are root)
init 3
7. cleanup specials (remove the .old directories)
More information about the Info-cyrus
mailing list