DBERROR: skiplist recovery mailboxes.db 0090 - suddenly all is
failing!
Gregor Wenkelewsky
w-sky at gmx.net
Mon Mar 5 16:04:44 EST 2007
On Mon, 19 Feb 2007 22:26:10 +0100, Andrew Morgan wrote:
> On Mon, 19 Feb 2007, Gregor Wenkelewsky wrote:
>
>> On Thu, 15 Feb 2007 18:15:53 +0100, Andrew Morgan wrote:
>>
>>>> Cyrus has been installed here just a few weeks ago, and after some hard
>>>> days it was working smoothly and very well. Until suddenly, sadly today
>>>> it started to fail completely with this error message in mail.warn,
>>>> mail.error and syslog:
>>>>
>>>> cyrus/imap[..]: DBERROR: skiplist recovery /var/lib/cyrus/mailboxes.db: 0090 should be ADD or DELETE
>>>> cyrus/imap[..]: DBERROR: opening /var/lib/cyrus/mailboxes.db: cyrusdb error
>>>
>>> You'll need to fix the corruption of the mailboxes.db file. It is a
>>> skiplist format file in your case, so do a google search for
>>> "skiplist.py". You'll find a python utility that can do some better
>>> recovery than the cyrus tools. The example is for cyrus seen-state files,
>>> but the same should work on the mailboxes.db as well.
>>
>> Fine, I succeeded with that! Did check the auto backup files before too,
>> but they were all identical. It was too late probably. Then I used the
>> skiplist.py from here: http://oss.netfarm.it/python-cyrus.php
>>
>> python ~/skiplist.py mailboxes.db >mailboxes.txt
>> rn mailboxes.db mailboxes.err
>> cvt_cyrusdb /var/lib/cyrus/mailboxes.txt flat /var/lib/cyrus/mailboxes.db skiplist
>> chown cyrus mailboxes.db
>>
>> Fixed!
>>
>> Okay then, now it works, but how often will an error like this occur, can
>> I do something to prevent it? First I thought of "0090" as some sort of
>> error code, and I found only two error incidents with /line/ 0090 in
>> Google... ;) ...but it is much more numerous.
>
> The 0090 is a skiplist offset/index within the file, so the error message
> could contain any number depending where the corruption happened.
>
>> Can this be related to shutting down and rebooting the system? Could be a
>> coincidence of course, but just after rebooting the error was there.
>
> Possibly, if Cyrus was stopped (kill -9?) in the middle of a skiplist
> operation.
I don't really know about that. Here is from the log during another
"controlled shutdown and reboot", of course I had to make sure that my
mailboxes.db error would not occur on every reboot. (It did not occur
again.) These are the last lines, no sign of a kill -9 signal:
Feb 28 15:20:05 Server cyrus/master[3869]: exiting on SIGTERM/SIGINT
Feb 28 15:20:13 Server postfix/master[4103]: terminating on signal 15
Feb 28 15:20:15 Server exiting on signal 15
When the error happened, a squatter run was completed about half an
hour before, and ctl_cyrusdb "checkpointing cyrus databases" exactly
4 mins 27 secs before. And then, the last lines were:
Feb 15 08:10:27 Server cyrus/master[3795]: exiting on SIGTERM/SIGINT
Feb 15 08:10:35 Server postfix/master[4104]: terminating on signal 15
"Server exiting" is missing!?!??!
>> Can it be related to Squatter? By default, Squatter was not set, but some
>> days before the error I set Squatter to an hourly "nice" run. Now I turned
>> it off again.
>
> I don't think squatter would have any relation, but I'm not running
> squatter here myself.
As far as I understand, squatter is only necessary if the IMAP function
to search in messages is being used. But then it helps to speed up the
search a lot. I guess we don't need squatter here too.
>>> You should also setup a cronjob to dump the mailboxes.db file to plaintext
>>> periodically (so it can be backed up). Something like this works here:
>>>
>>> 58 * * * * cyrus /usr/local/cyrus/bin/ctl_mboxlist -d > /var/spool/cyrus/config/mailboxes.db.dump
>>
>> Yes, I'll do that, though it's more like holding ready the plaster instead
>> of preventing the injury.
>>
>> This has to be written to /etc/crontab like Squatter, correct? How often
>> should it be running? Maybe it's only neccessary when new IMAP users and/or
>> folders have been created??
>
> Yes, that command above is exactly what I have in my crontab file. I
> can't remember why I have it run at 58 minutes after the hour. :)
Actually I erred, squatter runs are defined in /etc/cyrus.conf
But anyway, that is less important.
Put the cyrus dump to crontab
>> I feel queasy with an error that has no apparent reason. I wanted to
>> build a system that can run without administration for months and, maybe,
>> would sustain even a rare power failure. But there was no power failure
>> and no sign of a disc error either... :-(
>
> We've been running Cyrus for a couple years now with skiplist for
> mailboxes.db. So far we've never had a single corruption of mailboxes.db.
> Very rarely we'll get a corrupted username.seen file, which can be fixed
> using skiplist.py.
How do you recognize a corruption? I think it would be useful to have
and automated e-mail been sent as soon as some error occurs, so that
I can get to the system and fix it.
Last time Cyrus just started to repeat trying and failing to open the
db endlessly, thereby writing tons of messages to the log files until
stopped. Hence the malfunction would not be obvious if no one wants to
use e-mail during a few days (that is likely here) and no one checks
the server (likely too).
And hence that, I guess I should set up not just hourly, but daily and
weekly dumps of mailboxes.db, because the last unsoiled hourly backup
would be overwritten with an faulty backup after just one hour.
Am I right?
> You should be using some sort of journaling filesystem (we use ext3 here)
> if you are not already, although that cannot save you from some sources of
> skiplist corruption.
Using just Xubuntu's default, but it is ext3.
Yours, Gregor
More information about the Info-cyrus
mailing list