murder setup - mailboxes.db corruption - trouble recovering with ctl_mboxlist
Eric G. Wolfe
eric.wolfe at marshall.edu
Thu Nov 20 07:39:30 EST 2008
Just to follow-up on this issue.
Found this:
http://cyrusimap.web.cmu.edu/twiki/bin/view/Cyrus/CyrusMurderFailureModes
First, I followed the "Easy" instructions, which was a bust.
Next, I tried the "Hard" instructions. Four hours later the mupdate
master finished synch'ing with the backends. I started up the
front-ends, per the instructions. The front-ends failed to synch with
the mupdate master.
So in an effort to try something else. I figured if the mailboxes.db on
the front-ends and the master are the same format, I could just shutdown
the mupdate master; copy the mailboxes.db file over to the front-ends;
and start everything up. This was also a bust.
Oh, and I am getting these logs on the mupdate master. However, the
number in fs.file-nr is nowhere near approaching fs.file-max. There are
no ulimits on the 'cyrus' user. There was a maxfds=1024 parameter in
/etc/imapd.conf. I tried restarting without this parameter, and it
seemed I couldn't keep the master process running without it. If I
restart the service, it will run fine for a while, but it eventually
starts complaining again. So I tried quadrupling the maxfds value, and
we'll see if that helps.
imapd.conf (excerpt)
mupdate cmd="/usr/lib64/cyrus-imapd/mupdate -m" listen=3905
prefork=1 maxfds=1024
maillog (excerpt)
Nov 20 07:18:54 mumailmaster mupdate[27227]: refused connection from
mumailstore01
Nov 20 07:18:54 mumailmaster mupdate[27227]: warning: cannot open
/etc/hosts.allow: Too many open files
Additionally, I have double-checked all cyrus related service accounts
and their associated passwords. Our mupdate service account is
successfully authenticating on the mupdate master. I am getting a
"imap: kick_mupdate: can't connect to target: Connection refused" on the
front-ends. However, I can connect to port 3905 on the mupdate master.
I have not noticed anything strange on the backends, in the logs or
otherwise. I will follow-up, if I find out anything else.
Again, if anyone can point us in the right direction, it would be very
much appreciated.
Eric G. Wolfe wrote:
> We have a RHEL4u7 on all 5 servers:
> 1 mupdate master: mumailmaster
> 2 backends: mumailstore01, mumailstore02
> 2 Postfix MTA/Cyrus proxy frontends: mumail01, mumail02
>
> So I started getting this on my backends around 14:15 EST, at which time
> mail started getting deferred to the backends. I have 20,000+ per
> frontend deferred for delivery at the time of this e-mail.
>
> Nov 19 16:30:26 mumailstore01 ctl_cyrusdb[4672]: DBERROR db4: PANIC:
> Cannot allocate memory
> Nov 19 16:30:26 mumailstore01 ctl_cyrusdb[4672]: DBERROR: critical
> database situation
> Nov 19 16:30:26 mumailstore01 ctl_mboxlist[4673]: DBERROR db4: PANIC:
> fatal region error detected; run recovery
> Nov 19 16:30:26 mumailstore01 ctl_mboxlist[4673]: DBERROR: critical
> database situation
> Nov 19 16:30:26 mumailstore01 ctl_cyrusdb[4674]: DBERROR db4: PANIC:
> fatal region error detected; run recovery
> Nov 19 16:30:26 mumailstore01 ctl_cyrusdb[4674]: DBERROR: critical
> database situation
>
> I tried the following directions for recovery.
> http://asg.web.cmu.edu/archive/message.php?mailbox=archive.info-cyrus&searchterm=skiplist&msg=32337.
> I made backup copies of all files deleted, renaming them
> $filename.corrupt. I did this on each server, recovered on the
> backends, and let it push the updates to the mupdate server. Manually
> recovered mailboxes.db on the frontends, as they did not seem to be
> getting updated. If I am going about this wrong, please someone point
> me in the right direction for documentation on murder disaster recovery.
> The following, while somewhat helpful, does not go into a great amount
> of detail: http://cyrusimap.web.cmu.edu/imapd/install-murder.html.
>
> So at this point my user agent says Unknown/Invalid partition. The
> partitions are correctly defined on the backend mail stores. A
> 'ctl_mboxlist -d' shows correct partitions, no matter which local
> mailboxes.db I attempt to dump. Furthermore, LMTP is still not
> delivering to the backends during this outage.
>
> Any helpful tips or pointers would be appreciated.
>
> Thanks,
>
>
--
Eric G. Wolfe, IT Associate, Sr.
One John Marshall Drive
Marshall University, Drinko Library 428k
Huntington, WV 25755
Phone: 304.696.3428
Email: eric.wolfe at marshall.edu
"Who is General Failure and why is he reading my hard disk ?"
Microsoft spel chekar vor sail, worgs grate !!
(By leitner at inf.fu-berlin.de, Felix von Leitner)
More information about the Info-cyrus
mailing list