Self-healing mailbox?

Bron Gondwana brong at fastmail.fm
Sun May 31 08:41:52 EDT 2009


On Sun, May 31, 2009 at 10:53:14AM +0200, Sebastian Hagedorn wrote:
> -- "Ciprian Marius Vizitiu (GBIF)" <cvizitiu at gbif.org> is rumored to have 
>> I for one 'd worry more about that "System I/O error ". It can be that
>> there was a storage error which didn't happen during a second attempt.

Nah, it's called an "IOERROR", but it's just a "file didn't contain
what I expected error".

> I doubt it, because the same error occurred one day earlier:
>
> May 29 04:53:38 lvr13 cyr_expire[2602]: IOERROR: reading cache record for 
> user.aqg04: initial bogus offset 11090136 of 32488 for 5/7; mailbox needs 
> a reconstruct
> May 29 04:53:38 lvr13 cyr_expire[2602]: failure expiring user.aqg04: 
> System I/O error
>
> I interpret the "System I/O error" to be the IOERROR from the line 
> before, i.e. not actually an I/O error but rather a corrupt file. The 
> error does not show on any of the previous days, nor does it show today.

Sounds to me like it was copying the record to the cyrus.expunge.NEW 
file each time for the earlier days (and failing, so skipping the
entire mailbox), but now the record has actually expired, so it 
doesn't need to copy the cache record to the cyrus.cache.NEW file,
and hence never looks at it.

Meaning: yes, it did heal itself!

In my ideal world it would actually be self healing - if it failed
for any reason to read a cache record it would map the message,
create a new cache record, append it to the cyrus.cache file,
update the index pointer, and keep right on going.

(and next cyr_expire run, it would keep just the actually current
cache records, so it would clean up over time)

But - that's a lot of work, and I'm not there yet.

Bron ( in fact, I haven't even committed my lazy cache work yet,
       though it's sitting around mostly complete in my git tree )


More information about the Info-cyrus mailing list