Unexpected database recovery

Richard Gilbert R.Gilbert at sheffield.ac.uk
Thu Nov 20 08:22:08 EST 2003


> > Yesterday I applied John Wade's lock_flock patch to the version of Cyrus
> > imapd we were already running, i.e. 2.1.14 and rebuilt and reinstalled.
> > cyrus-imapd was restarted at 5 am this morning to minimise inconvenience
> > to users.  I was surprised to find that the system was unavailable until
> > about 08:39 because of database recovery.
> >
> > My question is: was this database recovery caused by the system realising
> > that the software had changed, or was it a complete coincidence?  We
> > restart the system three times a week at 5am and this has not happenned
> > before, as far as I know.
>
> The lock_flock patch has serious performance implications (namely, if you
> don't get a lock on the first try, you have to wait an entire second to
> try again),

Thank you very much, Rob, for your swift response. I suppose that explains
why the patch has never been incorporated in the distribution. :-)  A few
times yesterday colleagues said that there was "a problem with IMAP" when
it appeared to be fine in general.  I guess these could have been caused
by temporary performance problems.
[more below]

> and given that this happened just after you changed the
> locking mechanism, it seems suspicious.
>
> However, I can't think what would be causing the recovery process to lose
> at getting the locks it needs, so (nothing else should be running at
> that time)....
>
> FWIW, database recovery is necessary every time you restart cyrus to
> ensure that the databases are in a consistant state before data is served.

Thank you for pointing that out.  I checked and found that the recovery
was already taking ~90 mins before the patch, but no-one seemed to notice!

If I don't use the patch I expect the problem with LMTP delivery to return
with the associated ramp up of the number of db3 lockers reported.  The
only database which is using Berkeley DB is the duplicate delivery
database, so logically this must be the source of the db3 locking problem.
The database was very large (138 MBytes) and pruning on this was set to 3
days so I will change this to 1 day to reduce the size and consequent
recovery time.  However, I am beginning to wonder whether I should stop
using the duplicate delivery database as the simplest way of avoiding db3
locking problems.  Would this mean that a single message delivered to 50
users would start to appear as 50 separate copies rather than one file
with 50 links?

(Cyrus is running on a Solaris 8 system with about 28,000 users.)

TYIA

Richard
--
Richard Gilbert
Corporate Information and Computing Services
University of Sheffield, Sheffield, S10 2TN, UK
Phone: +44 114 222 3028   Fax: +44 114 222 3040




More information about the Info-cyrus mailing list