Another cache bug!

Patrick Boutilier boutilpj at ednet.ns.ca
Sun Feb 8 20:01:47 EST 2009


Bron Gondwana wrote:
> On Sun, Feb 08, 2009 at 08:18:44AM -0400, Patrick Boutilier wrote:
>> Bron Gondwana wrote:
>>> Also note: in the event of cache corruption, mailbox_cache_size_detail returns
>>> zero bytes, which correctly makes append_copy re-parse the message file.  It's
>>> all shiny :)
>>>
>>> Wes/Ken - please apply to CVS :)
>>>
>>> Thanks,
>>>
>>> Bron.
>>
>> I was wondering if this bug would account for the behaviour that I am  
>> seeing. I am getting corrupt cyrus.expunge files that are causing  
>> unexepunge to segfault.
>>
>> Looking at the metadata I have noticed that I am getting NEW files for  
>> expunge,index, and cache on affected mail boxes.
> 
> Unlikely.  This bug caused extra junk (sometimes quite a lot of it!) to
> be padded into the cyrus.cache file, but the offsets were still correct
> in the index file, so it shouldn't have caused any user-visible
> breakage.
> 
>> [root at student2 boutilpj]# find -name \*.NEW\*
>> ./RSS Feeds/KDE-Apps-org Content/cyrus.expunge.NEW
>> ./RSS Feeds/KDE-Apps-org Content/cyrus.index.NEW
>> ./RSS Feeds/KDE-Apps-org Content/cyrus.cache.NEW
>> ./RSS Feeds/Secunia - Latest Secunia Security Advisories/cyrus.expunge.NEW
>> ./RSS Feeds/Secunia - Latest Secunia Security Advisories/cyrus.index.NEW
>> ./RSS Feeds/Secunia - Latest Secunia Security Advisories/cyrus.cache.NEW
>> ./RSS Feeds/Linux Today/cyrus.expunge.NEW
>> ./RSS Feeds/Linux Today/cyrus.index.NEW
>> ./RSS Feeds/Linux Today/cyrus.cache.NEW
>> ./RSS Feeds/Slashdot/cyrus.expunge.NEW
>> ./RSS Feeds/Slashdot/cyrus.index.NEW
>> ./RSS Feeds/Slashdot/cyrus.cache.NEW
>> ./RSS Feeds/Freshmeat/cyrus.expunge.NEW
>> ./RSS Feeds/Freshmeat/cyrus.index.NEW
>> ./RSS Feeds/Freshmeat/cyrus.cache.NEW
>> ./cyrus.expunge.NEW
>> ./Mailing Lists/IMP/cyrus.expunge.NEW
>> ./Mailing Lists/IMP/cyrus.index.NEW
>> ./Mailing Lists/IMP/cyrus.cache.NEW
>> ./cyrus.cache.NEW
> 
> Yeah, so something is crashing.  There are a huge variety of things
> that could have caused problems (including kernels 2.6.22 -> 2.6.25.7 or
> so on 64 bit processors, where there was an MMAP page fault bug.  But you
> should have seen a LOT more corruption in that case.  Maybe you just
> haven't noticed it yet!)

Cyrus 2.3.13 compiled from source
Kernel 2.6.27.7 , x86_64
CentOS 5.2

> 
> What are you seeing in your syslog?  You do read your syslog, right?
> (actually, I don't - it's bloody huge.  We have auditlog as well, which
> more that doubles its size.  On the other hand, we have a perl program
> which reads it in real time and emails us if it sees anything
> unexpected)

Lots of these:

Feb  8 13:33:59 student2 cyr_expire[29242]: IOERROR: reading cache 
record for user.wooddm: got bogus offset 0 for 68/3; try reconstruct
Feb  8 13:33:59 student2 cyr_expire[29242]: failure expiring 
user.wooddm: System I/O error


> 
> Also, if it's segfaulting, you should be able to load the core file up
> in gdb and get a backtrace - any chance of emailing that to the list
> along with your imapd.conf and the exact version of cyrus that you're
> running.

I managed to "fix" things by lowering -X from 7 days to 1 day . For some 
reason cyr_expire ran without errors with -X1

/usr/local/cyrus/bin/cyr_expire -E 77 -X 3 -v -a -p 'user.wooddm'


Which brings me to another question. Why does cyr_expire have to be run 
with -E ? I put it up to 77 when running cyr_expire manually just so it 
wouldn't spend 30 minutes cleaning up the duplicate delivery database. :-)

When the corruption comes back I will attempt to get a backtrace from 
unexpunge, as it does segfault.

Feb  8 09:56:31 student2 kernel: unexpunge[21258]: segfault at 
7fa403d862dc ip 0000000000405d64 sp 00007fff9b1abd70 error 4 in 
unexpunge[400000+cb000]


> 
> Thanks,
> 
> Bron.



More information about the Info-cyrus mailing list