Cyrus with a NFS storage. random DBERROR

David Carter dpc22 at cam.ac.uk
Sat Jun 9 05:13:21 EDT 2007


On Sat, 9 Jun 2007, Rob Mueller wrote:

>> I run it directly, outside of master.  That way when it crashes, it
>> can be easily restarted.  I have a script that checks that it's
>> running, that the log file isn't too big, and that there are no log-
>> PID files that are too old.  If anything like that happens, it pages
>> someone.
>
> Ditto, we do almost exactly the same thing.

And for that matter, so I do.

> I think there's certain race conditions that still need ironing out, 
> because rerunning sync_client on the same log file that caused a bail 
> out usually succeeds the second time.

I suspect that the problem is with mailbox renames, which are not atomic 
and can take some time to complete with very large mailboxes.

sync_client retries a number of times and then bails out.

     if (folder_list->count) {
 	int n = 0;
 	do {
 	    sleep(n*2);  /* XXX  should this be longer? */
             ...
 	} while (r && (++n < SYNC_MAILBOX_RETRIES));

 	if (r) goto bail;
     }

This was one of the most significant compromises that Ken had to make when 
integrating my code into 2.3.

My original code cheats, courtesy of two other patches:

HERMES_FAST_RENAME:
   Translates mailbox rename into filesystem rename() where possible.
   Useful because sync_client chdir()s into the working directory.
   Would be less useful in 2.3 with split metadata.

HERMES_SYNC_SNAPSHOT:
   If mailbox action fails, promote to user action (no shared mailboxes)
   If user action fails then lock user out of the mboxlist and try again.

Together with my version of delayed expunge this pretty much guarantees 
that things aren't moving around under sync_client's feet. Its been an 
awful long time (about a year?) since I last had a sync_client bail out.

We are moving to 2.3 over the summer (initially using my own original 
replication code), so this is something that I would like to sort out.

Any suggestions?

-- 
David Carter                             Email: David.Carter at ucs.cam.ac.uk
University Computing Service,            Phone: (01223) 334502
New Museums Site, Pembroke Street,       Fax:   (01223) 334679
Cambridge UK. CB2 3QH.


More information about the Info-cyrus mailing list