Corruption to quotas.db (skiplist)
David Mayo
D.J.Mayo at bath.ac.uk
Mon Feb 22 10:44:54 EST 2010
Hi guys,
I touched on this in a recent topic about XFER however we have had a few
more problems with the quotas database and it is quite worrying.
We are transferring our mailboxes from our old Cyrus 2.2 IMAP server to
a Cyrus 2.3 IMAP server with replication. We have 25,000 mailboxes
totalling around 1.2 TB. Trying to move mailboxes in parallel ended up
with serious corruption to quotas.db. We saw lots of these lines:
Jan 23 04:06:47 sauber.bath.ac.uk imap[4434]: [ID 602473 mail.error]
IOERROR: lock_shared /opt/etc/imapd/quotas.db: Bad file number
Eventually resulting in *lots* of these lines:
Jan 23 08:10:32 sauber.bath.ac.uk imap[4434]: [ID 362402 mail.error]
skiplist: version mismatch: /opt/etc/imapd/quotas.db has version
2.1264205870
Jan 23 08:10:32 sauber.bath.ac.uk imap[4434]: [ID 558109 mail.error]
skiplist: closed while still locked
Jan 23 08:10:32 sauber.bath.ac.uk imap[4434]: [ID 729713 mail.error]
DBERROR: opening /opt/etc/imapd/quotas.db: cyrusdb error
Jan 23 08:10:32 sauber.bath.ac.uk imap[4434]: [ID 637875 mail.error]
Fatal error: can't read quotas file
Users couldn't log in, mail wasn't being delivered and I couldn't even
run "ctl_mboxlist -d". We had to regenerate the quotas DB from scratch
and reconstruct some mailboxes that were suddenly using 3,000% of their
quota!
Since then we have been moving mailboxes one at a time. We transferred
student mailboxes for two nights and this went fine with no errors, but
when we transferred some staff mailboxes we started seeing the "Bad file
number" errors again. We ran quota -f and fixed any corrupted quotas,
and the "Bad file number" errors stopped appearing.
Is there anything more we can do to protect ourselves from these errors?
Is anyone else using skiplist as their quotas.db format? I note that
"quotalegacy" is the default database format which is what our old Cyrus
2.2 IMAP server is using.
I wondered whether problems were caused due to staff leaving their PCs
switched on overnight however the logs do not show a correlation between
quotas that were corrupted and mailboxes that were being checked
overnight. Is the skiplist format suitably reliable for this database?
It certainly seems to work OK for all the other databases.
Regards,
Dave.
David Mayo
Networks/Systems Administrator
University of Bath Computing Services, UK
More information about the Info-cyrus
mailing list