> I have no real idea what could cause this but I have the following
> sequence in my db conversion script which is used by the init script
> in my rpms. The procedure is the best according do lots of my tests
> using different version of db3 and db4 with cyrus-imapd. As you can
> see I first try a db_checkpoint, then kill it if it seems to hang,
> then do a db_recover and only after this do a rm -vf
> $imap_prefix/db/log.* $imap_prefix/db/__db.*. I just tried to find out
> the safest procedure after simulated crashes, without really
> understanding BDB and why people like to use it so much. I don't, and
> my servers run fine without any BDB.

I might look into updating our start process, but the bit I don't get is
why the errors are occuring at all. I can only think of two possible
1. There's some DB state being left around somewhere so after the
restart, it's accessing corrupted data, though why a second restart
tends to fix it I don't know
2. There's some bug that manifests itself on mostly on small databases,
so the problem occuring is more coincidental whether it does or doesn't
happen. To be honest, I haven't kept a log of if it definitely happens
after every restart, but not every second.

The main reason we haven't switched to skiplist for deliver.db is that
on active servers like ours, the deliverdb can get pretty large
(500-1000M) even with daily pruning, and the skiplist DB implementation
requires mmap'ing the entire file into memory which gets problematic at
that size.

I notice that the latest 2.3 has a berkeley_hash_nosync option, which
might use s sufficiently different code path to avoid this issue. When
we upgrade I'll try that out...


