murder setup - mailboxes.db corruption - trouble recovering with ctl_mboxlist

Eric G. Wolfe eric.wolfe at marshall.edu
Thu Nov 20 07:39:30 EST 2008


Just to follow-up on this issue. 

Found this:
http://cyrusimap.web.cmu.edu/twiki/bin/view/Cyrus/CyrusMurderFailureModes

First, I followed the "Easy" instructions, which was a bust.

Next, I tried the "Hard" instructions.  Four hours later the mupdate 
master finished synch'ing with the backends.  I started up the 
front-ends, per the instructions.  The front-ends failed to synch with 
the mupdate master.

So in an effort to try something else.  I figured if the mailboxes.db on 
the front-ends and the master are the same format, I could just shutdown 
the mupdate master; copy the mailboxes.db file over to the front-ends; 
and start everything up.  This was also a bust.

Oh, and I am getting these logs on the mupdate master.  However, the 
number in fs.file-nr is nowhere near approaching fs.file-max.  There are 
no ulimits on the 'cyrus' user.  There was a maxfds=1024 parameter in 
/etc/imapd.conf.  I tried restarting without this parameter, and it 
seemed I couldn't keep the master process running without it.  If I 
restart the service, it will run fine for a while, but it eventually 
starts complaining again.   So I tried quadrupling the maxfds value, and 
we'll see if that helps.

imapd.conf (excerpt)
mupdate       cmd="/usr/lib64/cyrus-imapd/mupdate -m" listen=3905 
prefork=1 maxfds=1024

maillog (excerpt)
Nov 20 07:18:54 mumailmaster mupdate[27227]: refused connection from 
mumailstore01
Nov 20 07:18:54 mumailmaster mupdate[27227]: warning: cannot open 
/etc/hosts.allow: Too many open files

Additionally, I have double-checked all cyrus related service accounts 
and their associated passwords.  Our mupdate service account is 
successfully authenticating on the mupdate master.  I am getting a 
"imap: kick_mupdate: can't connect to target: Connection refused" on the 
front-ends.  However, I can connect to port 3905 on the mupdate master.

I have not noticed anything strange on the backends, in the logs or 
otherwise.  I will follow-up, if I find out anything else.

Again, if anyone can point us in the right direction, it would be very 
much appreciated.

Eric G. Wolfe wrote:
> We have a RHEL4u7 on all 5 servers:
> 1 mupdate master: mumailmaster
> 2 backends: mumailstore01, mumailstore02
> 2 Postfix MTA/Cyrus proxy frontends: mumail01, mumail02
>
> So I started getting this on my backends around 14:15 EST, at which time 
> mail started getting deferred to the backends.  I have 20,000+ per 
> frontend deferred for delivery at the time of this e-mail.
>
> Nov 19 16:30:26 mumailstore01 ctl_cyrusdb[4672]: DBERROR db4: PANIC: 
> Cannot allocate memory
> Nov 19 16:30:26 mumailstore01 ctl_cyrusdb[4672]: DBERROR: critical 
> database situation
> Nov 19 16:30:26 mumailstore01 ctl_mboxlist[4673]: DBERROR db4: PANIC: 
> fatal region error detected; run recovery
> Nov 19 16:30:26 mumailstore01 ctl_mboxlist[4673]: DBERROR: critical 
> database situation
> Nov 19 16:30:26 mumailstore01 ctl_cyrusdb[4674]: DBERROR db4: PANIC: 
> fatal region error detected; run recovery
> Nov 19 16:30:26 mumailstore01 ctl_cyrusdb[4674]: DBERROR: critical 
> database situation
>
> I tried the following directions for recovery. 
> http://asg.web.cmu.edu/archive/message.php?mailbox=archive.info-cyrus&searchterm=skiplist&msg=32337.  
> I made backup copies of all files deleted, renaming them 
> $filename.corrupt.  I did this on each server, recovered on the 
> backends, and let it push the updates to the mupdate server.  Manually 
> recovered mailboxes.db on the frontends, as they did not seem to be 
> getting updated.  If I am going about this wrong, please someone point 
> me in the right direction for documentation on murder disaster recovery. 
> The following, while somewhat helpful, does not go into a great amount 
> of detail: http://cyrusimap.web.cmu.edu/imapd/install-murder.html.
>
> So at this point my user agent says Unknown/Invalid partition.  The 
> partitions are correctly defined on the backend mail stores.  A 
> 'ctl_mboxlist -d' shows correct partitions, no matter which local 
> mailboxes.db I attempt to dump.  Furthermore, LMTP is still not 
> delivering to the backends during this outage.
>
> Any helpful tips or pointers would be appreciated.
>
> Thanks,
>
>   


-- 
Eric G. Wolfe, IT Associate, Sr.
One John Marshall Drive
Marshall University, Drinko Library 428k
Huntington, WV 25755
Phone: 304.696.3428
Email: eric.wolfe at marshall.edu

"Who is General Failure and why is he reading my hard disk ?"
Microsoft spel chekar vor sail, worgs grate !!
(By leitner at inf.fu-berlin.de, Felix von Leitner)



More information about the Info-cyrus mailing list