murder setup - mailboxes.db corruption - trouble recovering with ctl_mboxlist

Wolfe, Eric G eric.wolfe at marshall.edu
Thu Nov 20 12:00:20 EST 2008


_______________________________________
From: Wesley Craig [wes at umich.edu]
Sent: Thursday, November 20, 2008 10:27 AM
To: Eric G.Wolfe
Cc: info-cyrus at lists.andrew.cmu.edu
Subject: Re: murder setup - mailboxes.db corruption - trouble recovering with ctl_mboxlist

On 20 Nov 2008, at 07:39, Eric G. Wolfe wrote:
> Found this:
> http://cyrusimap.web.cmu.edu/twiki/bin/view/Cyrus/
> CyrusMurderFailureModes
> First, I followed the "Easy" instructions, which was a bust.

>Did you see this:

>        https://bugzilla.andrew.cmu.edu/show_bug.cgi?id=2819

No, but that was a problem at 2:00 AM not 6:00 AM after trying the "Hard" instructions.

> If you're not running something pretty current, Easy doesn't work.

> Next, I tried the "Hard" instructions.  Four hours later the mupdate
> master finished synch'ing with the backends.  I started up the
> front-ends, per the instructions.  The front-ends failed to synch with
> the mupdate master.

> So, after Hard, you have a good copy on the mupdate master.  In what
> way do the frontends fail to sync?

I don't know, are the frontends' mailboxes.db supposed to stay at 144 bytes for 30 minutes or more?  On the backends, at least I could monitor the mailboxes.db growing, not quickly, but progress was apparent.

> So in an effort to try something else.  I figured if the
> mailboxes.db on
> the front-ends and the master are the same format, I could just
> shutdown
> the mupdate master; copy the mailboxes.db file over to the front-ends;
> and start everything up.  This was also a bust.

> Presuming you're using skiplist, you can in fact just copy
> mailboxes.db between mupdate master and frontends.  However, when
> frontends start up, they insist on getting a full copy of the
> database from mupdate master.  This can take some time, and in older
> versions might error out in various ways.

We are using skiplist, I copied the mailboxes.db to frontends.  If the frontends are updating, it is not apparent.  I could not verify that either of them were synching from the master after the mupdate master synch.  Which is why I copied them to the frontends to speed things up.

> Oh, and I am getting these logs on the mupdate master.  However, the
> number in fs.file-nr is nowhere near approaching fs.file-max.
> There are
> no ulimits on the 'cyrus' user.  There was a maxfds=1024 parameter in
> /etc/imapd.conf.  I tried restarting without this parameter, and it
> seemed I couldn't keep the master process running without it.  If I
> restart the service, it will run fine for a while, but it eventually
> starts complaining again.   So I tried quadrupling the maxfds
> value, and
> we'll see if that helps.
>
> imapd.conf (excerpt)
> mupdate       cmd="/usr/lib64/cyrus-imapd/mupdate -m" listen=3905
> prefork=1 maxfds=1024
>
> maillog (excerpt)
> Nov 20 07:18:54 mumailmaster mupdate[27227]: refused connection from
> mumailstore01
> Nov 20 07:18:54 mumailmaster mupdate[27227]: warning: cannot open
> /etc/hosts.allow: Too many open files

> The high connection rate is caused by mail delivery.  Stock lmtp
> proxy connects to the mupdate master to get backend information,
> instead of referring to the local mailboxes.db.  I have patches for
> 2.2.x cyrus, in 2.3.x cyrus, "unified" murder refers to mailboxes.db
> instead of mupdate master.  The fact that lmtp proxy refers to
> mupdate master in any configuration is probably a bug.

Strange that it would just start causing problems now.  We probably are seeing a cascading effect of failure, with the backlog though.  Do the latest vanilla trees, have these patches included in them?  The packages here: http://cyrusimap.web.cmu.edu/downloads.html#imap.  I am somewhat reluctant to upgrade things in a fragile state.  If these patches are included in latest releases, is 2.3.13 a fairly painless upgrade path from 2.2.12, or do we need to go with 2.2.13?

> With a large mail backlog, plus new inbound mail, this bottleneck is
> a big problem.  Couple that with trying to resync the frontends, and
> mupdate master is an even smaller bottleneck.

Is there anything we can turn off in the cyrus.conf or imapd.conf, to work around this "bottleneck"?  In other words, can we keep the MTA from knocking on the door for long enough to get everything running smoothly again?

> Additionally, I have double-checked all cyrus related service accounts
> and their associated passwords.  Our mupdate service account is
> successfully authenticating on the mupdate master.  I am getting a
> "imap: kick_mupdate: can't connect to target: Connection refused"
> on the
> front-ends.  However, I can connect to port 3905 on the mupdate
> master.

> The kick_mupdate error is just a signal that the mupdate on the
> frontend is in the process of resyncing.  It can be ignored.

Ok, because it sounds like a problem of connecting to the mupdate master port on 3905, to the unitiated.

:wes


More information about the Info-cyrus mailing list