another problem with conversations db

Tue Feb 12 08:39:10 EST 2019

Hi Bron,

sorry, i had to rearrange some quotes to put them my answers in a more  
meaningful order.

Quoting Bron Gondwana <brong at fastmailteam.com>:

> On Mon, Feb 4, 2019, at 22:00, Michael Menge wrote:
>>
>> Quoting Bron Gondwana <brong at fastmail.fm>:
>>
>> > On Mon, Feb 4, 2019, at 20:21, Michael Menge wrote:
>> >>

>> >> Feb 4 01:10:55 mailserv03 be/cyr_expire[7626]: IOERROR: opening
>> >> /srv/cyrus-be/ssd-part/L/user/XXXX/2185.: No such file or directory
>> >> Feb 4 01:10:55 mailserv03 be/cyr_expire[7626]: IOERROR: opening
>> >> /srv/cyrus-be/ssd-part/L/user/XXXX/2185.: No such file or directory
>> >> Feb 4 01:10:55 mailserv03 be/cyr_expire[7626]: IOERROR archive
>> >> user.XXXX 2185 failed to copyfile
>> >> (/srv/cyrus-be/ssd-part/L/user/XXXX/2185. =>
>> >> /srv/cyrus-hdd-be/archive/ssd-part/L/user/XXXX/2185.): Unknown code
>> >> ____ 255
>> >
>> >
>> > Ouch. Yeah, that could have been caused by a bug in delivery, and
>> > would definitely cause conversations DB corruption if the index file
>> > was updated but the conversations DB wasn't or vice versa.
>> >
>> >> The file was already at  
>> /srv/cyrus-hdd-be/archive/ssd-part/L/user/XXXX/2185.

I was able to fix these problems with reconstruct, and the didn't  
reappear till now.
Also there where other accounts which had IOERRORS regarding the  
conversation db,
with no cyr_expire archive errors, so i believe that these problems  
are not related.

I tried rebuilding the conversation db for the accounts with errors,  
but some other
accounts will show up with errors some time later. I counldn't find a  
some thing in
common jet.

>> >> > Anyway, I don't think that would break anything.
>> >> >
>> >> > metapartition-ssd: /srv/cyrus-ssd-be/meta/ssd-part
>> >> > metapartition_files: header index cache expunge squat annotations
>> >> > lock dav archivecache
>> >> >
>> >> > Ooh, I haven't tested having cache and archivecache on the same
>> >> > location. That's really interesting. Again, I'd be in favour of
>> >> > separation here, give them different paths. That might be tricky
>> >> > with ssd though, the way this is laid out. I assume you have some
>> >> > kind of symlink farm going on?
>> >> >
>> >>
>> >> I didn't know that there could be a problem with cache and archivecache.
>> >> At the time we decided on the configuration for cyrus 3.0 I looked at the
>> >> imapd.conf man page and for metapartition_files decided that I want all
>> >> meta files on the ssd storage. There was no indication in the man page
>> >> that there could be a problem.
>> >
>> > Fair. I'd have to test that to see if it works correctly. I would
>> > hope so, but I haven't tested that configuration. This is the
>> > downside with having lots of different ways to do things!
>> >
>> >> How do I separate location of archivecache from the other
>> >> metapartition path?
>> >> And fix the cache and archivecache files?
>> >
>> > This I don't know a good answer for. I will test if having the same
>> > path for cache and archivecache could fail. I THINK that I made the
>> > code safe for it, but I'm not sure that it's been tested.
>> >
>> >> No there is no sysmlink farm. We have mounted different iSCSI volumes to
>> >> /srv/cyrus-ssd-be, /srv/cyrus-hdd-be and /srv/cyrus-be
>> >
>> > Right. That makes sense.

Did you have time to look into the cache/archivecache situation jet?

>> > Right! I do wonder if there are some bugs in 3.0.x which are fixed
>> > on master around delivery to archive partition. We definitely had
>> > bugs on master, but I thought they were newly introduced on master
>> > as well, which is why the fixes weren't backported. But if you're
>> > having files be in the wrong location, maybe there are bugs on 3.0.x
>> > as well.

Are all fixes from master backported to 3.0?

Is the new Commit "I will try your new commits regarding CID" related to the
"IOERROR: conversations_audit on load:" and "IOERROR:  
conversations_audit on store"?

I will try your new commits in the next days on my test servers to sea  
if the fix
the endless loop in ctl_conversationsdb I have seen for some accounts.

Quoting myself (Re: prblems rebuilding conversations db) Jan 24, 2019

> The program loops in build_cid_cb (imap/ctl_conversationsdb.c:189)
>
> For the problematic mailbox that I tested, for every message
> record->cid was NULLCONVERSATION, so mailbox_cacherecord,
> message_update_conversations and mailbox_rewrite_index_record
> where called, each returned 0.
>
> After iterating trough all messages in the mailbox count was > 0, and r==0,
> so the while condition (!r && count) was true for the next run.
> At this point record->cid was still NULLCONVERSATION for every message,
> which I guess should not be the case.

Michael

--------------------------------------------------------------------------------
M.Menge                                Tel.: (49) 7071/29-70316
Universität Tübingen                   Fax.: (49) 7071/29-5912
Zentrum für Datenverarbeitung          mail:  
michael.menge at zdv.uni-tuebingen.de
Wächterstraße 76
72074 Tübingen