Unified Murder, lost Mupdate Master, over-eager Mupdate Slaves

Janne Peltonen janne.peltonen at helsinki.fi
Thu Apr 12 05:08:28 EDT 2007


Hi!

I've been playing around with a unified Murder config in a test
environment. Mostly it appears to behave very well. Then it occurred to
me to test what would happen if I were to lose the mailboxes database on
the Mupdate Master (for example, as a result of an fs failure). So:
shutdown the Mupdate Master cyrus-master, rm /var/lib/imap/mailboxes.db,
Mupdate Master back up.

What happens?

Well, my unified Murder crows started deleting their mailboxes. Like
this:

--klip--
Apr 12 10:48:36 m2cn1t mupdate[6190]: couldn't connect to mupdate server
Apr 12 10:48:36 m2cn1t mupdate[6190]: retrying connection to mupdate
server in 22 seconds
[...]
Apr 12 10:48:58 m2cn1t mupdate[6190]: successful mupdate connection to
lcluster.it.helsinki.fi
Apr 12 10:48:58 m2cn1t mupdate[6190]: unready for connections
Apr 12 10:48:58 m2cn1t mupdate[6190]: synchronizing mailbox list with
master mupdate server
Apr 12 10:48:58 m2cn1t mupdate[6190]: mupdate NO response: mailbox
doesn't exist
Apr 12 10:48:58 m2cn1t mupdate[6190]: MUPDATE: can't delete mailbox
entry 'user.atest001'
Apr 12 10:48:58 m2cn1t mupdate[6190]: mailbox list synchronization
complete
--klip--

Strange thing is, it says "can't delete" but still deletes. The mailbox
list on each of my nodes (crows) ends up empty.

The problem appears to be that if I leave the mupdate slave process
(that runs on a frontend in a traditional config) running on the
backend, it synchronizes itself with the Mupdate Master as soon as the
Mupdate Master is back up (and the retry timeout occurs) - and the node
(backend) (crow) actually ens up deleting all its mailboxes. So the
solution I came up with is:

-mupdate master failure
->
 -comment out the mupdate slave services on each backend
 -kill -HUP the cyrus-master process on each backend (so that there'll
 be no break in the IMAP service)
 -repair mupdate master, get it back online
 -ctl_mboxlist -m on any backend
 -decomment the mupdate slave services on each backend
 -kill -HUP the cyrus-master process on each backend

But this is quite a procedure, and it seems to me all too easy to forget
that the mupdate slaves MUST NOT BE RUNNING on any of the backends while
the mupdate master with an empty mailboxes db comes up. Moreover, I
don't understand why a backend has to delete local mailboxes that the
Mupdate Master doesn't know about. Shouldn't the backend be the
authoritative source for its local mailboxes in every situation? I think
the reason that the mupdate slave is allowed to delete any mailbox
entry from the local mailbox db is that the slave process runs on a
frontend in a traditional config, and a traditional frontend doesn't
have any local mailboxes at all. But shouldn't the slave process include
code to prevent it from deleting local mailboxes? It shouldn't be a
problem in a traditional murder (there being no local mailboxes), and it
would save from a disaster in a unified murder config.

Just my 2 cents...


--Janne
-- 
Janne Peltonen <janne.peltonen at helsinki.fi>


More information about the Info-cyrus mailing list