another problem with conversations db

Bron Gondwana brong at fastmailteam.com
Mon Feb 4 06:09:44 EST 2019


On Mon, Feb 4, 2019, at 22:00, Michael Menge wrote:
> 
> Quoting Bron Gondwana <brong at fastmail.fm>:
> 
> > On Mon, Feb 4, 2019, at 20:21, Michael Menge wrote:
> >> Hi,
> >>
> >> Quoting Bron Gondwana <brong at fastmailteam.com>:
> >>
> >> > Hi Michael,
> >> >
> >> > Sorry about the delay in looking at this - I was mad crazy busy
> >> > getting ready to go overseas. At Fosdem now, about to give a talk
> >> > about JMAP!
> >> >
> >> > OK, let's start with the things that give me a little bit of hives...
> >> >
> >> > configdirectory: /srv/cyrus-be
> >> > partition-default: /srv/cyrus-be
> >> > partition-ssd: /srv/cyrus-be/ssd-part
> >> >
> >> > Ouch. There's a couple of things I wouldn't do there - having the
> >> > partition be the same as the config directory, and having a separate
> >> > partition be a subdirectory of a partition. They're both asking for
> >> > trouble. I would probably lay my system out like:
> >> >
> >> > configdirectory: /srv/cyrus-be/conf
> >> > partition-default: /srv/cyrus-be/default-part
> >> > partition-ssd: /srv/cyrus-be/ssd-part
> >> >
> >>
> >> partition-default isn't used any more. To use the metapartition we moved
> >> all accounts form the default partition to the ssd partition which is the
> >> the new defaultpartition ("defaultpartition: ssd")
> >
> > Right - that makes sense.
> >
> >> > And then each tree would only have one type of thing in it.
> >> >
> >> > Anyway, I don't think that would break anything.
> >> >
> >> > metapartition-ssd: /srv/cyrus-ssd-be/meta/ssd-part
> >> > metapartition_files: header index cache expunge squat annotations
> >> > lock dav archivecache
> >> >
> >> > Ooh, I haven't tested having cache and archivecache on the same
> >> > location. That's really interesting. Again, I'd be in favour of
> >> > separation here, give them different paths. That might be tricky
> >> > with ssd though, the way this is laid out. I assume you have some
> >> > kind of symlink farm going on?
> >> >
> >>
> >> I didn't know that there could be a problem with cache and archivecache.
> >> At the time we decided on the configuration for cyrus 3.0 I looked at the
> >> imapd.conf man page and for metapartition_files decided that I want all
> >> meta files on the ssd storage. There was no indication in the man page
> >> that there could be a problem.
> >
> > Fair. I'd have to test that to see if it works correctly. I would 
> > hope so, but I haven't tested that configuration. This is the 
> > downside with having lots of different ways to do things!
> >
> >> How do I separate location of archivecache from the other 
> >> metapartition path?
> >> And fix the cache and archivecache files?
> >
> > This I don't know a good answer for. I will test if having the same 
> > path for cache and archivecache could fail. I THINK that I made the 
> > code safe for it, but I'm not sure that it's been tested.
> >
> >> No there is no sysmlink farm. We have mounted different iSCSI volumes to
> >> /srv/cyrus-ssd-be, /srv/cyrus-hdd-be and /srv/cyrus-be
> >
> > Right. That makes sense.
> >
> >> > Otherwise it all looks OK. Are you getting other IOERRORs in your
> >> > log files which could show things aborting? It really looks like
> >> > your conversations DB is getting out of sync due to other failures.
> >>
> >> I found a few instances of 3 related errors.
> >>
> >> Feb 4 01:10:55 mailserv03 be/cyr_expire[7626]: IOERROR: opening
> >> /srv/cyrus-be/ssd-part/L/user/XXXX/2185.: No such file or directory
> >> Feb 4 01:10:55 mailserv03 be/cyr_expire[7626]: IOERROR: opening
> >> /srv/cyrus-be/ssd-part/L/user/XXXX/2185.: No such file or directory
> >> Feb 4 01:10:55 mailserv03 be/cyr_expire[7626]: IOERROR archive
> >> user.XXXX 2185 failed to copyfile
> >> (/srv/cyrus-be/ssd-part/L/user/XXXX/2185. =>
> >> /srv/cyrus-hdd-be/archive/ssd-part/L/user/XXXX/2185.): Unknown code
> >> ____ 255
> >
> >
> > Ouch. Yeah, that could have been caused by a bug in delivery, and 
> > would definitely cause conversations DB corruption if the index file 
> > was updated but the conversations DB wasn't or vice versa.
> >
> >> The file was already at /srv/cyrus-hdd-be/archive/ssd-part/L/user/XXXX/2185.
> >
> > Right! I do wonder if there are some bugs in 3.0.x which are fixed 
> > on master around delivery to archive partition. We definitely had 
> > bugs on master, but I thought they were newly introduced on master 
> > as well, which is why the fixes weren't backported. But if you're 
> > having files be in the wrong location, maybe there are bugs on 3.0.x 
> > as well.
> >
> > Do you have the syslog lines at the time that email was delivered?
> 
> I dont' have the log, for that message, but I will search for a
> more recent example.

Great.

> 
> From the mail headers i know that it was not dilivered to the archive 
> partition
> but moved by cyr_expire. The conversation db was not used at that time.

OK - that shouldn't matter then - because the conversations rebuild should have found it.

> PS.: the timesamp of the file is not the internal date but the time
> the mail was moved to the archive partition. I was wondering about the reason.

Hmm, yeah:

r = cyrus_copyfile(srcname, destname, COPYFILE_MKDIR);

That's how the file is moved. It only does a hardlink if it's the same filesystem. Interestingly, it does NOT set the timestamp correctly. This is clearly a bug.

https://github.com/cyrusimap/cyrus-imapd/issues/2641

Bron.


--
 Bron Gondwana, CEO, FastMail Pty Ltd
 brong at fastmailteam.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.andrew.cmu.edu/pipermail/info-cyrus/attachments/20190204/c3bb8d6c/attachment-0001.html>


More information about the Info-cyrus mailing list