Cyrus with a NFS storage. random DBERROR
robm at fastmail.fm
Sat Jun 9 00:19:05 EDT 2007
>I run it directly, outside of master. That way when it crashes, it
> can be easily restarted. I have a script that checks that it's
> running, that the log file isn't too big, and that there are no log-
> PID files that are too old. If anything like that happens, it pages
Ditto, we do almost exactly the same thing.
Also if we switch master/replica roles, our code looks for any incomplete
log files after stopping the master, and runs those first to ensure that
replication was completely up to date.
It seems anyone seriously using replication has to unfortunately do these
things manually at the moment. Replication just isn't reliable enough, we
see sync_client bail out quite regularly, and there's not enough logging to
exactly pinpoint why each time. I think there's certain race conditions that
still need ironing out, because rerunning sync_client on the same log file
that caused a bail out usually succeeds the second time. It would be nice if
some code was actually made part of the core cyrus distribution to make this
all work properly, including switching master/replica roles.
More information about the Info-cyrus