another problem with conversations db

Bron Gondwana brong at fastmailteam.com
Wed Feb 13 04:37:22 EST 2019


On Wed, Feb 13, 2019, at 00:39, Michael Menge wrote:
> Hi Bron,
> 
> sorry, i had to rearrange some quotes to put them my answers in a more 
> meaningful order.
> 
> 
> Quoting Bron Gondwana <brong at fastmailteam.com>:
> 
> >> >> The file was already at 
> >> /srv/cyrus-hdd-be/archive/ssd-part/L/user/XXXX/2185.
> 
> I was able to fix these problems with reconstruct, and the didn't 
> reappear till now.
> Also there where other accounts which had IOERRORS regarding the 
> conversation db,
> with no cyr_expire archive errors, so i believe that these problems 
> are not related.
> 
> I tried rebuilding the conversation db for the accounts with errors, 
> but some other
> accounts will show up with errors some time later. I counldn't find a 
> some thing in
> common jet.

OK. That's hard to disagnore from remotely :(

> 
> >> >> > Anyway, I don't think that would break anything.
> >> >> >
> >> >> > metapartition-ssd: /srv/cyrus-ssd-be/meta/ssd-part
> >> >> > metapartition_files: header index cache expunge squat annotations
> >> >> > lock dav archivecache
> >> >> >
> >> >> > Ooh, I haven't tested having cache and archivecache on the same
> >> >> > location. That's really interesting. Again, I'd be in favour of
> >> >> > separation here, give them different paths. That might be tricky
> >> >> > with ssd though, the way this is laid out. I assume you have some
> >> >> > kind of symlink farm going on?
> >> >> >
> >> >>
> >> >> I didn't know that there could be a problem with cache and archivecache.
> >> >> At the time we decided on the configuration for cyrus 3.0 I looked at the
> >> >> imapd.conf man page and for metapartition_files decided that I want all
> >> >> meta files on the ssd storage. There was no indication in the man page
> >> >> that there could be a problem.
> >> >
> >> > Fair. I'd have to test that to see if it works correctly. I would
> >> > hope so, but I haven't tested that configuration. This is the
> >> > downside with having lots of different ways to do things!
> >> >
> >> >> How do I separate location of archivecache from the other
> >> >> metapartition path?
> >> >> And fix the cache and archivecache files?
> >> >
> >> > This I don't know a good answer for. I will test if having the same
> >> > path for cache and archivecache could fail. I THINK that I made the
> >> > code safe for it, but I'm not sure that it's been tested.
> >> >
> >> >> No there is no sysmlink farm. We have mounted different iSCSI volumes to
> >> >> /srv/cyrus-ssd-be, /srv/cyrus-hdd-be and /srv/cyrus-be
> >> >
> >> > Right. That makes sense.
> 
> Did you have time to look into the cache/archivecache situation jet?

Yes, I've looked at the code!

in mailbox_archive():

 /* got a new cache record to write */
 if (differentcache)
 {
 dirtycache = 1;
 copyrecord.cache_offset = 0;
 if (mailbox_append_cache(mailbox, &copyrecord))
 continue;
 }

And the code for differentcache is:

 char *spoolcache = xstrdup(mailbox_meta_fname(mailbox, META_CACHE));
 char *archivecache = xstrdup(mailbox_meta_fname(mailbox, META_ARCHIVECACHE));
 int differentcache = strcmp(spoolcache, archivecache);

So it looks like the answer is cache/archivecache is fine. It is not your problem.

> 
> >> > Right! I do wonder if there are some bugs in 3.0.x which are fixed
> >> > on master around delivery to archive partition. We definitely had
> >> > bugs on master, but I thought they were newly introduced on master
> >> > as well, which is why the fixes weren't backported. But if you're
> >> > having files be in the wrong location, maybe there are bugs on 3.0.x
> >> > as well.
> 
> Are all fixes from master backported to 3.0?

Unfortunately it's hard to tell, because many of the fixes on master are fixes to bugs that were only introduced on master, and some bugs on 3.0 we just say "the fix is so invasive that it's basically just backporting 90% of master, which is pointless for a stable release".

> Is the new Commit "I will try your new commits regarding CID" related to the
> "IOERROR: conversations_audit on load:" and "IOERROR: 
> conversations_audit on store"?

Shouldn't be. It just means we store the G keys regardless of whether the record has a CID.

> I will try your new commits in the next days on my test servers to sea 
> if the fix
> the endless loop in ctl_conversationsdb I have seen for some accounts.

I guess one more question - are you running the most recent index version? (reconstruct -V max)

> Quoting myself (Re: prblems rebuilding conversations db) Jan 24, 2019
> 
> > The program loops in build_cid_cb (imap/ctl_conversationsdb.c:189)
> >
> > For the problematic mailbox that I tested, for every message
> > record->cid was NULLCONVERSATION, so mailbox_cacherecord,
> > message_update_conversations and mailbox_rewrite_index_record
> > where called, each returned 0.
> >
> > After iterating trough all messages in the mailbox count was > 0, and r==0,
> > so the while condition (!r && count) was true for the next run.
> > At this point record->cid was still NULLCONVERSATION for every message,
> > which I guess should not be the case.

ctl_conversationsdb -b should update the cid. BUT - if you're running old mailboxes which have a format which doesn't support saving the CID, that would for sure break things!

Bron.

--
 Bron Gondwana, CEO, FastMail Pty Ltd
 brong at fastmailteam.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.andrew.cmu.edu/pipermail/info-cyrus/attachments/20190213/bdffdb55/attachment-0001.html>


More information about the Info-cyrus mailing list