annotations.db corruption

Mon May 8 14:33:11 EDT 2006

Ken Murchison wrote:
> Bernhard Reiter wrote:
>> On Mon, Apr 10, 2006 at 05:17:56PM +0200, Bernhard Reiter wrote:
>>> On Sun, Apr 09, 2006 at 10:54:24PM -0400, Ken Murchison wrote:
>>>> Bernhard Reiter wrote:
>>>>> On Fri, Apr 07, 2006 at 05:24:53PM -0400, Ken Murchison wrote:
>>>>>> Martin Konold wrote:
>>
>>>>>>> Can you explain how a dying process can create such a broken 
>>>>>>> skiplist db?
>>
>>>> I've already asked the person that wrote the code to take a look and 
>>>> share his thoughts.
>>> Thanks, we are looking forward to it.
>>
>> Ken,
>> were there any ideas from the person you have asked?
> 
> No.  I assume he's busy at Google.
> 

We already saw this 'dying process corrupting skiplist db' a lot, you 
reproduce it with a really big db (>5-7M mailboxes) with little memory 
(<= 1GB) in a linux box easily, some mmap operations will fail ENOMEM 
and the process will give up with a resulting broken db.

NOTE: mmap will fail with ENOMEM in linux with free memory and lots of 
swap free.

We already saw another type of problem with SMP (2 x Xeon with HT, 4 
'processors' for linux) (Cyrus 2.2.10), resulted in corruption too (and 
all problems with sincronization between frontends, mupdate and 
backends). We "solved" it running a UP kernel on that same box, mupdate 
doesnt need all that cpu power. IE: There is a race there.

--
Sergio Bruder