Cyrus with a NFS storage. random DBERROR
robm at fastmail.fm
Sun Jun 10 04:42:01 EDT 2007
> I suspect that the problem is with mailbox renames, which are not atomic
> and can take some time to complete with very large mailboxes.
I think there's some other issues as well. For instance we still see
skiplist seen state databases get corrupted every now and then. It seems
certain corruption can result in the skiplist code calling abort() which
terminates the sync_server, and causes the sync_client to bail out. I had a
back trace on one of them the other day, but the stack frames were all wrong
so it didn't seem that useful.
> Translates mailbox rename into filesystem rename() where possible.
> Useful because sync_client chdir()s into the working directory.
> Would be less useful in 2.3 with split metadata.
It would still be nice to do this to make renames faster anyway. If you did.
1. Add new mailboxes to mailboxes.db
2. Filesystem rename
3. Remove old mailboxes
You end up with a race condition, but it's far shorter than the mess you can
end up with at the moment if a restart occurs during a rename.
> Together with my version of delayed expunge this pretty much guarantees
> that things aren't moving around under sync_client's feet. Its been an
> awful long time (about a year?) since I last had a sync_client bail out.
> We are moving to 2.3 over the summer (initially using my own original
> replication code), so this is something that I would like to sort out.
> Any suggestions?
I can try and keep an eye on bailouts some more, and see if I can get some
more details. It would be nice if there was some more logging about why the
bail out code path was actually called!
More information about the Info-cyrus