Another cache bug!
Patrick Boutilier
boutilpj at ednet.ns.ca
Sun Feb 8 20:01:47 EST 2009
Bron Gondwana wrote:
> On Sun, Feb 08, 2009 at 08:18:44AM -0400, Patrick Boutilier wrote:
>> Bron Gondwana wrote:
>>> Also note: in the event of cache corruption, mailbox_cache_size_detail returns
>>> zero bytes, which correctly makes append_copy re-parse the message file. It's
>>> all shiny :)
>>>
>>> Wes/Ken - please apply to CVS :)
>>>
>>> Thanks,
>>>
>>> Bron.
>>
>> I was wondering if this bug would account for the behaviour that I am
>> seeing. I am getting corrupt cyrus.expunge files that are causing
>> unexepunge to segfault.
>>
>> Looking at the metadata I have noticed that I am getting NEW files for
>> expunge,index, and cache on affected mail boxes.
>
> Unlikely. This bug caused extra junk (sometimes quite a lot of it!) to
> be padded into the cyrus.cache file, but the offsets were still correct
> in the index file, so it shouldn't have caused any user-visible
> breakage.
>
>> [root at student2 boutilpj]# find -name \*.NEW\*
>> ./RSS Feeds/KDE-Apps-org Content/cyrus.expunge.NEW
>> ./RSS Feeds/KDE-Apps-org Content/cyrus.index.NEW
>> ./RSS Feeds/KDE-Apps-org Content/cyrus.cache.NEW
>> ./RSS Feeds/Secunia - Latest Secunia Security Advisories/cyrus.expunge.NEW
>> ./RSS Feeds/Secunia - Latest Secunia Security Advisories/cyrus.index.NEW
>> ./RSS Feeds/Secunia - Latest Secunia Security Advisories/cyrus.cache.NEW
>> ./RSS Feeds/Linux Today/cyrus.expunge.NEW
>> ./RSS Feeds/Linux Today/cyrus.index.NEW
>> ./RSS Feeds/Linux Today/cyrus.cache.NEW
>> ./RSS Feeds/Slashdot/cyrus.expunge.NEW
>> ./RSS Feeds/Slashdot/cyrus.index.NEW
>> ./RSS Feeds/Slashdot/cyrus.cache.NEW
>> ./RSS Feeds/Freshmeat/cyrus.expunge.NEW
>> ./RSS Feeds/Freshmeat/cyrus.index.NEW
>> ./RSS Feeds/Freshmeat/cyrus.cache.NEW
>> ./cyrus.expunge.NEW
>> ./Mailing Lists/IMP/cyrus.expunge.NEW
>> ./Mailing Lists/IMP/cyrus.index.NEW
>> ./Mailing Lists/IMP/cyrus.cache.NEW
>> ./cyrus.cache.NEW
>
> Yeah, so something is crashing. There are a huge variety of things
> that could have caused problems (including kernels 2.6.22 -> 2.6.25.7 or
> so on 64 bit processors, where there was an MMAP page fault bug. But you
> should have seen a LOT more corruption in that case. Maybe you just
> haven't noticed it yet!)
Cyrus 2.3.13 compiled from source
Kernel 2.6.27.7 , x86_64
CentOS 5.2
>
> What are you seeing in your syslog? You do read your syslog, right?
> (actually, I don't - it's bloody huge. We have auditlog as well, which
> more that doubles its size. On the other hand, we have a perl program
> which reads it in real time and emails us if it sees anything
> unexpected)
Lots of these:
Feb 8 13:33:59 student2 cyr_expire[29242]: IOERROR: reading cache
record for user.wooddm: got bogus offset 0 for 68/3; try reconstruct
Feb 8 13:33:59 student2 cyr_expire[29242]: failure expiring
user.wooddm: System I/O error
>
> Also, if it's segfaulting, you should be able to load the core file up
> in gdb and get a backtrace - any chance of emailing that to the list
> along with your imapd.conf and the exact version of cyrus that you're
> running.
I managed to "fix" things by lowering -X from 7 days to 1 day . For some
reason cyr_expire ran without errors with -X1
/usr/local/cyrus/bin/cyr_expire -E 77 -X 3 -v -a -p 'user.wooddm'
Which brings me to another question. Why does cyr_expire have to be run
with -E ? I put it up to 77 when running cyr_expire manually just so it
wouldn't spend 30 minutes cleaning up the duplicate delivery database. :-)
When the corruption comes back I will attempt to get a backtrace from
unexpunge, as it does segfault.
Feb 8 09:56:31 student2 kernel: unexpunge[21258]: segfault at
7fa403d862dc ip 0000000000405d64 sp 00007fff9b1abd70 error 4 in
unexpunge[400000+cb000]
>
> Thanks,
>
> Bron.
More information about the Info-cyrus
mailing list