Corruption to quotas.db (skiplist)

David Mayo D.J.Mayo at bath.ac.uk
Mon Feb 22 10:44:54 EST 2010


Hi guys,

I touched on this in a recent topic about XFER however we have had a few 
more problems with the quotas database and it is quite worrying.

We are transferring our mailboxes from our old Cyrus 2.2 IMAP server to 
a Cyrus 2.3 IMAP server with replication. We have 25,000 mailboxes 
totalling around 1.2 TB. Trying to move mailboxes in parallel ended up 
with serious corruption to quotas.db. We saw lots of these lines:

Jan 23 04:06:47 sauber.bath.ac.uk imap[4434]: [ID 602473 mail.error] 
IOERROR: lock_shared /opt/etc/imapd/quotas.db: Bad file number

Eventually resulting in *lots* of these lines:

Jan 23 08:10:32 sauber.bath.ac.uk imap[4434]: [ID 362402 mail.error] 
skiplist: version mismatch: /opt/etc/imapd/quotas.db has version 
2.1264205870
Jan 23 08:10:32 sauber.bath.ac.uk imap[4434]: [ID 558109 mail.error] 
skiplist: closed while still locked
Jan 23 08:10:32 sauber.bath.ac.uk imap[4434]: [ID 729713 mail.error] 
DBERROR: opening /opt/etc/imapd/quotas.db: cyrusdb error
Jan 23 08:10:32 sauber.bath.ac.uk imap[4434]: [ID 637875 mail.error] 
Fatal error: can't read quotas file

Users couldn't log in, mail wasn't being delivered and I couldn't even 
run "ctl_mboxlist -d". We had to regenerate the quotas DB from scratch 
and reconstruct some mailboxes that were suddenly using 3,000% of their 
quota!

Since then we have been moving mailboxes one at a time. We transferred 
student mailboxes for two nights and this went fine with no errors, but 
when we transferred some staff mailboxes we started seeing the "Bad file 
number" errors again. We ran quota -f and fixed any corrupted quotas, 
and the "Bad file number" errors stopped appearing.

Is there anything more we can do to protect ourselves from these errors? 
Is anyone else using skiplist as their quotas.db format? I note that 
"quotalegacy" is the default database format which is what our old Cyrus 
2.2 IMAP server is using.

I wondered whether problems were caused due to staff leaving their PCs 
switched on overnight however the logs do not show a correlation between 
quotas that were corrupted and mailboxes that were being checked 
overnight. Is the skiplist format suitably reliable for this database? 
It certainly seems to work OK for all the other databases.

Regards,


Dave.

David Mayo
Networks/Systems Administrator
University of Bath Computing Services, UK



More information about the Info-cyrus mailing list