From dilyan.palauzov at aegee.org Thu May 2 12:31:53 2019 From: dilyan.palauzov at aegee.org (=?UTF-8?Q?=D0=94=D0=B8=D0=BB=D1=8F=D0=BD_?= =?UTF-8?Q?=D0=9F=D0=B0=D0=BB=D0=B0=D1=83=D0=B7=D0=BE=D0=B2?=) Date: Thu, 02 May 2019 16:31:53 +0000 Subject: https://imapwiki.org/ImapTest/ServerStatus update for Cyrus Imap 3.0 Message-ID: <84f9bd10cb3742fd5831288bd9a61e6bbb4d6b18.camel@aegee.org> Hello, anybody with IMAP expertize willing to update https://imapwiki.org/ImapTest/ServerStatus for a current Cyrus Imap status? It says right now Cyrus 2.3.12p2 is a non-compliant server. Regards ????? From ellie at fastmail.com Thu May 2 22:50:15 2019 From: ellie at fastmail.com (ellie timoney) Date: Thu, 02 May 2019 22:50:15 -0400 Subject: =?UTF-8?Q?Re:_https://imapwiki.org/ImapTest/ServerStatus_update_for_Cyru?= =?UTF-8?Q?s_Imap_3.0?= In-Reply-To: <84f9bd10cb3742fd5831288bd9a61e6bbb4d6b18.camel@aegee.org> References: <84f9bd10cb3742fd5831288bd9a61e6bbb4d6b18.camel@aegee.org> Message-ID: This ("ImapTest") looks like the thing that we can run from Cassandane as Cassandane::Cyrus::ImapTest, I haven't had it set up for a while (haven't gotten around to it on new laptop yet...) but I think last time I looked we passed a lot of the tests on 3.0, and some of the ones we failed looked like bad tests Bron, are you still running this test suite regularly? How are we looking on master? I think I remember you were planning to get in touch with Timo to get some of the bogus tests fixed, but I don't remember if this ever happened It would be good to get this page updated, even if we're still a "non-compliant server", at least to have something more recent than 2.3 on there....(!) It's a wiki so I'll try and create an account and see if I can just update it myself, but I'd like to refresh my understanding of where we're at before I embarrass myself! Cheers, ellie On Fri, May 3, 2019, at 2:33 AM, ????? ???????? wrote: > Hello, > > anybody with IMAP expertize willing to update > https://imapwiki.org/ImapTest/ServerStatus for a current Cyrus Imap > status? > > It says right now Cyrus 2.3.12p2 is a non-compliant server. > > Regards > ????? > > From brong at fastmailteam.com Fri May 3 09:41:25 2019 From: brong at fastmailteam.com (Bron Gondwana) Date: Fri, 03 May 2019 09:41:25 -0400 Subject: =?UTF-8?Q?Re:_https://imapwiki.org/ImapTest/ServerStatus_update_for_Cyru?= =?UTF-8?Q?s_Imap_3.0?= In-Reply-To: References: <84f9bd10cb3742fd5831288bd9a61e6bbb4d6b18.camel@aegee.org> Message-ID: <154f8a34-8e9b-4c6d-a2f5-94a3f90581c1@www.fastmail.com> On Fri, May 3, 2019, at 12:50, ellie timoney wrote: > This ("ImapTest") looks like the thing that we can run from Cassandane as Cassandane::Cyrus::ImapTest, I haven't had it set up for a while (haven't gotten around to it on new laptop yet...) but I think last time I looked we passed a lot of the tests on 3.0, and some of the ones we failed looked like bad tests We fail two tests in vanilla upstream ImapTest last I checked, and there are pull requests that I made for them years ago, but maybe they fell through the cracks. They were cases where Cyrus is technically correct in rejecting something that you could also argue is OK to accept. > Bron, are you still running this test suite regularly? How are we looking on master? I think I remember you were planning to get in touch with Timo to get some of the bogus tests fixed, but I don't remember if this ever happened Yeah, I still run it regularly. It has run clean since 2.4.something. > It would be good to get this page updated, even if we're still a "non-compliant server", at least to have something more recent than 2.3 on there....(!) There are some things that ImapTest can test which we don't implement, but we're fully compliant on all the things we do implement. (at least, if you don't turn on deliberately standards-breaking stuff like autoexpuge or search_fuzzy_always). > It's a wiki so I'll try and create an account and see if I can just update it myself, but I'd like to refresh my understanding of where we're at before I embarrass myself! Excellent. I don't remember having difficulty creating an account when I last did anything on that wiki. Bron. -- Bron Gondwana, CEO, FastMail Pty Ltd brong at fastmailteam.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From Dilyan.Palauzov at aegee.org Fri May 17 09:51:17 2019 From: Dilyan.Palauzov at aegee.org (=?UTF-8?B?0JTQuNC70Y/QvSDQn9Cw0LvQsNGD0LfQvtCy?=) Date: Fri, 17 May 2019 16:51:17 +0300 Subject: Prepending Xapian Tiers Message-ID: <64A4D697-25D0-4F6B-BFD1-A38102F38523@aegee.org> Hello, I set up a Cyrus system with one tier. I think it works. The .xapianactive files contain 'tiername: 0'. How can I insert a second tier? Adding a XYZsearchpartition-default to imapd.conf, together with defaultsearchtier: XYZ does not utilize the new directory: it stays empty and the .xapianactive files do not get updated to mention the new tier. Besides, if a message is MOVEd over IMAP, is any optimization utilized, to avoid reindexing the message, but just change the address of the document? Regards ????? -------------- next part -------------- An HTML attachment was scrubbed... URL: From Dilyan.Palauzov at aegee.org Fri May 17 12:58:31 2019 From: Dilyan.Palauzov at aegee.org (Dilyan Palauzov) Date: Fri, 17 May 2019 16:58:31 +0000 Subject: https://github.com/cyrusimap/cyrus-imapd/pull/2711 does not apply any more cleanly Message-ID: <20190517165831.Horde.g3x-I6EuO1zjbgtfLmCRakl@webmail.aegee.org> Hello, https://github.com/cyrusimap/cyrus-imapd/pull/2711 does not apply any more cleanly, somebody has to work again on it. Is it a pure accident, if contributions will be accepted within reasonable time without reminder, and how can it be predicted, if something will be ignored or handled? Reasonable time expires within three months, or in the moment when a patch does not apply anymore cleanly, whichever comes first. https://github.com/cyrusimap/cyrus-imapd/issues/1767 as another case, which took a lot of time, and at the end was hibernated. Is handling patches just random happening, or is there a process, which can be used to foresee how and when things will be handled? Regards ????? From brong at fastmailteam.com Mon May 20 04:52:07 2019 From: brong at fastmailteam.com (Bron Gondwana) Date: Mon, 20 May 2019 18:52:07 +1000 Subject: Prepending Xapian Tiers In-Reply-To: <64A4D697-25D0-4F6B-BFD1-A38102F38523@aegee.org> References: <64A4D697-25D0-4F6B-BFD1-A38102F38523@aegee.org> Message-ID: On Fri, May 17, 2019, at 23:52, ????? ???????? wrote: > Hello, > > I set up a Cyrus system with one tier. I think it works. The .xapianactive files contain 'tiername: 0'. > > How can I insert a second tier? I have never tried this on a live server! Clearly the right thing to do is to build a cassandane search which implements doing this so that we can make sure it works. > Adding a XYZsearchpartition-default to imapd.conf, together with defaultsearchtier: XYZ does not utilize the new directory: it stays empty and the .xapianactive files do not get updated to mention the new tier. That looks like it should work. I assume you have restarted your cyrus since making the change? I'm not certain that a rolling squatter will discover a new config in the way that imapd does. Also - you'll need to run squatter in compact mode in order to add a new xapianactive entry. The simplest could be: squatter -z tiername -t tiername -o I believe that given your current setup, this will just copy the entry from tiername:0 to tirename:1 and also create XYZ:0 in the xapianactive file at the same time. > Besides, if a message is MOVEd over IMAP, is any optimization utilized, to avoid reindexing the message, but just change the address of the document? Yes, both XAPINDEXED mode where the GUID is read from xapian, and CONVINDEXED mode where the GUID is looked up via user.conversations and then mapped into the cyrus.indexed.db files in each xapian tier allow Xapian to skip reindexing when a message is already indexed. This works for both MOVE and for re-uploading of an identical message file via IMAP. Cheers, Bron. -- Bron Gondwana, CEO, FastMail Pty Ltd brong at fastmailteam.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From brong at fastmailteam.com Mon May 20 07:36:03 2019 From: brong at fastmailteam.com (Bron Gondwana) Date: Mon, 20 May 2019 21:36:03 +1000 Subject: Notes 20 May Message-ID: <4aa66104-55fa-464e-af63-3239b5bb88af@www.fastmail.com> Present: Bron, Robert, Ken, ellie Bron in Vienna with Robert! ellie: * now has proper internet after many months of unreliable provider issues! * not much else to report Ken: * libical and multi-value parameters/properties discussion (there's a FastMail issue about it) - what's the expected behaviour on the client side? - not a blocker right now, but has been allocated over from Robert to Ken * Mailbox/get handling of ACL changes - this is long and complex because it requires keeping modseq per ACL entry, so it's a mailboxes.db format change - will work on that this week * did some bugfixing last week * nearly finished with vacation support - might have to go back to forcing user to include the vacation script, because otherwise STOP rules will mean that vacation doesn't happen. - plan is to parse active script and check if our custom jmapvacation.script is included - if not, reject the vacation object. - we're using :fcc at FastMail, may need to add a custom extension to allow setting that via JMAP. - current spec doesn't have a way to specify how many days to suppress responses for -> will need to fall back to server default. Could also add this in FM extension, but it's odd that it's not supported in upstream. - addresses: how to know it was specifically addressed to you rather than a bulk email. Maybe need to add Identity support to Cyrus too! Right now it only returns authenticated ID. Maybe add a property to INBOX with a list of email addresses! - https://github.com/cyrusimap/cyrus-imapd/issues/2760 One per identity. Probably uuid as the key. * promised to write a draft for discovery of shared calendars before next week. Robert: * Finally landed Xapian language indexing on upstream master * Spent a lot of time on JSCalendar RFC based on feedback from the mailing list * Now started working on annotations API changes. Will be working on this for the next week. Bron: * working on caching jmap objects in the dav database. Bron won't be available at the standard time next week, but will be around working. -- Bron Gondwana, CEO, FastMail Pty Ltd brong at fastmailteam.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From Dilyan.Palauzov at aegee.org Tue May 21 04:40:55 2019 From: Dilyan.Palauzov at aegee.org (Dilyan Palauzov) Date: Tue, 21 May 2019 08:40:55 +0000 Subject: Prepending Xapian Tiers In-Reply-To: References: <64A4D697-25D0-4F6B-BFD1-A38102F38523@aegee.org> Message-ID: <20190521084055.Horde.0kj0EVtXbXYitwEoo7qjIps@webmail.aegee.org> Hello, thanks, Bron, for your answer. I gave it a try. squatter does not remove .NEW directories when aborted (SIGINT), the directories have to be removed manually squatter -t X -z X -o recognizes, when the directory structure behind tier X exists, that nothing has to be done, prints ?Skipping X for user.ABC, only one? and quits, without updating the .xapianactive files. squatter -t Y -z Y -o, when the directory structructure behind tier Y does not exist, prints ?compressing Y:1,Y:0 to Y:2 for user... (active Y:1,Y:0)?. As far as I remember this has not updated the xapianactive files. squatter -t X -z Y -o does add to the .xapianactive files the defaultsearhtier, but first has to duplicate with rsync all existing files. This takes a while? But at the end did what I wanted. Afterwards the directory structure for the new tier was not created. The directory structure was created once I started all the cyrus processes again. squatter -t X -z Y -o emits the message ?undefined search partition X,Ysearchpartition-default? and then ?compressing X:0,X,Y:0 to Y:2 for ... (active Y:0,X:0,X,Y:0,Y:1)?. Does squatter -t X -z Y append X to Y, or it deletes Y and copies X to Y? In the latter case, is there any (performance) difference between "squatter -t X,Y -z Y" and ?squatter -t Y,X -z Y?? Can one xapian tier store a document, and another tier store the information, that the address of the document has changed? Regards ????? ----- Message from Bron Gondwana --------- Date: Mon, 20 May 2019 18:52:07 +1000 From: Bron Gondwana Subject: Re: Prepending Xapian Tiers To: Cyrus Devel > On Fri, May 17, 2019, at 23:52, ????? ???????? wrote: >> Hello, >> >> I set up a Cyrus system with one tier. I think it works. The >> .xapianactive files contain 'tiername: 0'. >> >> How can I insert a second tier? > > I have never tried this on a live server! Clearly the right thing to > do is to build a cassandane search which implements doing this so > that we can make sure it works. > >> Adding a XYZsearchpartition-default to imapd.conf, together with >> defaultsearchtier: XYZ does not utilize the new directory: it stays >> empty and the .xapianactive files do not get updated to mention the >> new tier. > > That looks like it should work. I assume you have restarted your > cyrus since making the change? I'm not certain that a rolling > squatter will discover a new config in the way that imapd does. > > Also - you'll need to run squatter in compact mode in order to add a > new xapianactive entry. The simplest could be: > > squatter -z tiername -t tiername -o > > I believe that given your current setup, this will just copy the > entry from tiername:0 to tirename:1 and also create XYZ:0 in the > xapianactive file at the same time. > >> Besides, if a message is MOVEd over IMAP, is any optimization >> utilized, to avoid reindexing the message, but just change the >> address of the document? > > Yes, both XAPINDEXED mode where the GUID is read from xapian, and > CONVINDEXED mode where the GUID is looked up via user.conversations > and then mapped into the cyrus.indexed.db files in each xapian tier > allow Xapian to skip reindexing when a message is already indexed. > This works for both MOVE and for re-uploading of an identical > message file via IMAP. > > Cheers, > > Bron. > > -- > Bron Gondwana, CEO, FastMail Pty Ltd > brong at fastmailteam.com ----- End message from Bron Gondwana ----- From Dilyan.Palauzov at aegee.org Tue May 21 04:55:50 2019 From: Dilyan.Palauzov at aegee.org (Dilyan Palauzov) Date: Tue, 21 May 2019 08:55:50 +0000 Subject: Does squatter retry on transient errors (xapian)? Message-ID: <20190521085550.Horde.1A8Bx5VZjPKSf-WILqZf5ZJ@webmail.aegee.org> Hello, https://fastmail.blog/2014/12/01/email-search-system/ says: The process is as follows: 5. if the xapianactive file has changed, discard all our work (we lock against this, but it's a sanity check) and exit Does squatter exit with an error message in this case, and has to be started again, or does it retry automatically, until it successes? What means ?indexing?? -T can be passed to squatter only in compact mode, and is a directory for temporary files. But the documentation says, this directory is used when ?indexing?. How big is the data stored there? Lets say if tier A allocates K MB for its data, and tier B allocates L MB for the data, is during compacting of A and B approximately K+L MBs used in the temporary directory? Regards ????? From brong at fastmailteam.com Tue May 21 07:46:42 2019 From: brong at fastmailteam.com (Bron Gondwana) Date: Tue, 21 May 2019 21:46:42 +1000 Subject: Prepending Xapian Tiers In-Reply-To: <20190521084055.Horde.0kj0EVtXbXYitwEoo7qjIps@webmail.aegee.org> References: <64A4D697-25D0-4F6B-BFD1-A38102F38523@aegee.org> <20190521084055.Horde.0kj0EVtXbXYitwEoo7qjIps@webmail.aegee.org> Message-ID: <19545f54-4f17-413d-b1c1-9f09a40f2710@www.fastmail.com> On Tue, May 21, 2019, at 18:41, Dilyan Palauzov wrote: > Hello, > > thanks, Bron, for your answer. > > I gave it a try. > > squatter does not remove .NEW directories when aborted (SIGINT), the > directories have to be removed manually https://github.com/cyrusimap/cyrus-imapd/issues/2765 > > squatter -t X -z X -o recognizes, when the directory structure behind > tier X exists, that nothing has to be done, prints ?Skipping X for > user.ABC, only one? and quits, without updating the .xapianactive files. yeah right, that won't work. Glad to know :) > squatter -t Y -z Y -o, when the directory structructure behind tier Y > does not exist, prints ?compressing Y:1,Y:0 to Y:2 for user... (active > Y:1,Y:0)?. As far as I remember this has not updated the xapianactive > files. Yeah right, it won't add a new target unless you are compressing the current first item in xapianactive. > squatter -t X -z Y -o does add to the .xapianactive files the > defaultsearhtier, but first has to duplicate with rsync all existing > files. This takes a while? But at the end did what I wanted. > Afterwards the directory structure for the new tier was not created. > The directory structure was created once I started all the cyrus > processes again. That makes sense. We don't create a directory structure until a document gets created in there. > squatter -t X -z Y -o emits the message ?undefined search partition > X,Ysearchpartition-default? and then ?compressing X:0,X,Y:0 to Y:2 for > ... (active Y:0,X:0,X,Y:0,Y:1)?. That sounds like a sanity checking failure! Good catch: https://github.com/cyrusimap/cyrus-imapd/issues/2764 > Does squatter -t X -z Y append X to Y, or it deletes Y and copies X to > Y? In the latter case, is there any (performance) difference between > "squatter -t X,Y -z Y" and ?squatter -t Y,X -z Y?? There's no difference in what order you add items to -t. -t is a comma separated list of selectors for source items. You can even explicitly say: squatter -t X:0,X:2,Y:45 -z Y and it will compact just those three sources into a new target in Y. What it does under the hood is creates a new database and copy all the documents over from the source databases, then compress the end result into the most compact and fastest xapian format which is designed to never write again. This compressed file is then stored into the target database name, and in an exclusively locked operation the new database is moved into place and the old tiers are removed from the xapianactive, such that all new searches look into the single destination database instead of the multiple source databases. > Can one xapian tier store a document, and another tier store the > information, that the address of the document has changed? It doesn't work like that. The addresses of the documents never change (they are the sha1 of the document contents, and Cyrus documents are all immutable). The xapian engine searches across the full set of databases listed in xapianactive in order to find document ids, then maps them through the conversations.db file to find the actual emails. A copy/move of an email updates the conversations.db lookups, so the next search will find the new location without anything changing in xapian. the cyrus.indexed.db file is just a convenience to allow rolling squatter to avoid having to re-scan records that it knows are already indexed. Bron. > Regards > ????? > > ----- Message from Bron Gondwana --------- > Date: Mon, 20 May 2019 18:52:07 +1000 > From: Bron Gondwana > Subject: Re: Prepending Xapian Tiers > To: Cyrus Devel > > > > On Fri, May 17, 2019, at 23:52, ????? ???????? wrote: > >> Hello, > >> > >> I set up a Cyrus system with one tier. I think it works. The > >> .xapianactive files contain 'tiername: 0'. > >> > >> How can I insert a second tier? > > > > I have never tried this on a live server! Clearly the right thing to > > do is to build a cassandane search which implements doing this so > > that we can make sure it works. > > > >> Adding a XYZsearchpartition-default to imapd.conf, together with > >> defaultsearchtier: XYZ does not utilize the new directory: it stays > >> empty and the .xapianactive files do not get updated to mention the > >> new tier. > > > > That looks like it should work. I assume you have restarted your > > cyrus since making the change? I'm not certain that a rolling > > squatter will discover a new config in the way that imapd does. > > > > Also - you'll need to run squatter in compact mode in order to add a > > new xapianactive entry. The simplest could be: > > > > squatter -z tiername -t tiername -o > > > > I believe that given your current setup, this will just copy the > > entry from tiername:0 to tirename:1 and also create XYZ:0 in the > > xapianactive file at the same time. > > > >> Besides, if a message is MOVEd over IMAP, is any optimization > >> utilized, to avoid reindexing the message, but just change the > >> address of the document? > > > > Yes, both XAPINDEXED mode where the GUID is read from xapian, and > > CONVINDEXED mode where the GUID is looked up via user.conversations > > and then mapped into the cyrus.indexed.db files in each xapian tier > > allow Xapian to skip reindexing when a message is already indexed. > > This works for both MOVE and for re-uploading of an identical > > message file via IMAP. > > > > Cheers, > > > > Bron. > > > > -- > > Bron Gondwana, CEO, FastMail Pty Ltd > > brong at fastmailteam.com > > > ----- End message from Bron Gondwana ----- > > > -- Bron Gondwana, CEO, FastMail Pty Ltd brong at fastmailteam.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From brong at fastmailteam.com Tue May 21 08:10:17 2019 From: brong at fastmailteam.com (Bron Gondwana) Date: Tue, 21 May 2019 22:10:17 +1000 Subject: Does squatter retry on transient errors (xapian)? In-Reply-To: <20190521085550.Horde.1A8Bx5VZjPKSf-WILqZf5ZJ@webmail.aegee.org> References: <20190521085550.Horde.1A8Bx5VZjPKSf-WILqZf5ZJ@webmail.aegee.org> Message-ID: <3ec8ef6a-1fa2-4a2f-90fe-fc7ec47825b2@www.fastmail.com> On Tue, May 21, 2019, at 18:56, Dilyan Palauzov wrote: > Hello, > > https://fastmail.blog/2014/12/01/email-search-system/ says: > > The process is as follows: > 5. if the xapianactive file has changed, discard all our work (we lock > against this, but it's a sanity check) and exit > > Does squatter exit with an error message in this case, and has to be > started again, or does it retry automatically, until it successes? No, it just keeps going. It will syslog something about the failure, but a compact failure is generally considered to be not a bad thing. > What means ?indexing?? -T can be passed to squatter only in compact > mode, and is a directory for temporary files. But the documentation > says, this directory is used when ?indexing?. How big is the data > stored there? Lets say if tier A allocates K MB for its data, and > tier B allocates L MB for the data, is during compacting of A and B > approximately K+L MBs used in the temporary directory? It doesn't work like that - you don't allocate a particular amount of space for each tier. The space used in -T during a single compact is roughly the sum of the database sizes of the databases which are being compacted together. I expect the documentation could be made more clear here. Bron. -- Bron Gondwana, CEO, FastMail Pty Ltd brong at fastmailteam.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From Dilyan.Palauzov at aegee.org Sat May 25 08:18:12 2019 From: Dilyan.Palauzov at aegee.org (Dilyan Palauzov) Date: Sat, 25 May 2019 12:18:12 +0000 Subject: Prepending Xapian Tiers In-Reply-To: <19545f54-4f17-413d-b1c1-9f09a40f2710@www.fastmail.com> References: <64A4D697-25D0-4F6B-BFD1-A38102F38523@aegee.org> <20190521084055.Horde.0kj0EVtXbXYitwEoo7qjIps@webmail.aegee.org> <19545f54-4f17-413d-b1c1-9f09a40f2710@www.fastmail.com> Message-ID: <20190525121812.Horde.3MgWS2uzMBgxzYXe9Eqb5Bg@webmail.aegee.org> Hello Bron, For me it is still not absolutely clear how things work with the Xapian seach backend. Has search_batchsize any impact during compacting? Does this setting say how many new messages have to arrive, before indexing them together in Xapian? What are the use-cases to call "squatter -F" (In compact mode, filter the resulting database to only include messages which are not expunged in mailboxes with existing name/uidvalidity.) and "squatter -X" (Reindex all messages before compacting. This mode reads all the lists of messages indexed by the listed tiers, and re-indexes them into a temporary database before compacting that into place)? Why shall one keep index of deleted and expunged messages and how to delete references from messages that are both expunged and expired (after cyr_expire -X0, so removed from the hard disk), but keep the index to messages that are still on the hard disk, but the user expunged (double-deleted) them. How does re-compacting (as in https://fastmail.blog/2014/12/01/email-search-system/) differ from re-indexing (as in the manual page of master/squatter)? What gets indexed? For a mailbox receiving only reports (dkim, dmarc, mta-sts, arf, mtatls), some of which are archived (zip, gzip) the Xapian index increases very fast. How can I remove a tier, that contains no data, but is mentioned in the .xapianactive files? How can I rename a tier? How can I efficiently prepend a new tear in the .xapianactive file? ?squatter -t X -z Y -o? does add to the .xapianactive files the defaultsearhtier, but first has to duplicate with rsync all existing files. This is not efficient, as big files have to copied. > What it does under the hood is creates a new database and copy all > the documents over from the source databases, then compress the end > result into the most compact and fastest xapian format which is > designed to never write again. This compressed file is then stored > into the target database name, and in an exclusively locked > operation the new database is moved into place and the old tiers are > removed from the xapianactive, such that all new searches look into > the single destination database instead of the multiple source > databases. I do not get this. The amount of tiers to check does not reduce after doing merging and with three tears the amount of databases is most of the time three. What happens, if squatter is terminated during Xapian-compacting, apart from leaving temporary files? Will rerunning it, just start from beginning? Is the idea to have three tiers like this: At run time, new messages are indexed by Xapian in squatter-rolling mode on tmpfs/RAM, say on tear T1. Regalarly, the RAM database is compacted to hard disk (tear T2), say T1 and T2 are megred into T2. The database on the hard disk is read-only and search in it is accelerated, as the database is ?compact?. Only if two compactions happen in parallel of the same sources or destination, the merge fails and is skipped for that user. The merge is retried whenever merging T1 and T2 is rescheduled. As the databases in T2 get bigger, merging T1 and T2 takes more and more time. So one more Xapian tear is created, T3. Less regularly, T2 and T3 are merged into T3. This process takes a while. But afterwards, T2 is again small, so merging T1 and T2 into T2 is fast. How many tears make sense, apart from having one more for power-off events? Regards ????? ----- Message from Bron Gondwana --------- Date: Tue, 21 May 2019 21:46:42 +1000 From: Bron Gondwana Subject: Re: Prepending Xapian Tiers To: ????? ???????? Cc: Cyrus Devel > On Tue, May 21, 2019, at 18:41, Dilyan Palauzov wrote: >> Hello, >> >> thanks, Bron, for your answer. >> >> I gave it a try. >> >> squatter does not remove .NEW directories when aborted (SIGINT), the >> directories have to be removed manually > > https://github.com/cyrusimap/cyrus-imapd/issues/2765 > >> >> squatter -t X -z X -o recognizes, when the directory structure behind >> tier X exists, that nothing has to be done, prints ?Skipping X for >> user.ABC, only one? and quits, without updating the .xapianactive files. > > yeah right, that won't work. Glad to know :) > >> squatter -t Y -z Y -o, when the directory structructure behind tier Y >> does not exist, prints ?compressing Y:1,Y:0 to Y:2 for user... (active >> Y:1,Y:0)?. As far as I remember this has not updated the xapianactive >> files. > > Yeah right, it won't add a new target unless you are compressing the > current first item in xapianactive. > >> squatter -t X -z Y -o does add to the .xapianactive files the >> defaultsearhtier, but first has to duplicate with rsync all existing >> files. This takes a while? But at the end did what I wanted. >> Afterwards the directory structure for the new tier was not created. >> The directory structure was created once I started all the cyrus >> processes again. > > That makes sense. We don't create a directory structure until a > document gets created in there. > >> squatter -t X -z Y -o emits the message ?undefined search partition >> X,Ysearchpartition-default? and then ?compressing X:0,X,Y:0 to Y:2 for >> ... (active Y:0,X:0,X,Y:0,Y:1)?. > > That sounds like a sanity checking failure! Good catch: > > https://github.com/cyrusimap/cyrus-imapd/issues/2764 > >> Does squatter -t X -z Y append X to Y, or it deletes Y and copies X to >> Y? In the latter case, is there any (performance) difference between >> "squatter -t X,Y -z Y" and ?squatter -t Y,X -z Y?? > > There's no difference in what order you add items to -t. -t is a > comma separated list of selectors for source items. You can even > explicitly say: > > squatter -t X:0,X:2,Y:45 -z Y and it will compact just those three > sources into a new target in Y. > > What it does under the hood is creates a new database and copy all > the documents over from the source databases, then compress the end > result into the most compact and fastest xapian format which is > designed to never write again. This compressed file is then stored > into the target database name, and in an exclusively locked > operation the new database is moved into place and the old tiers are > removed from the xapianactive, such that all new searches look into > the single destination database instead of the multiple source > databases. > >> Can one xapian tier store a document, and another tier store the >> information, that the address of the document has changed? > > It doesn't work like that. The addresses of the documents never > change (they are the sha1 of the document contents, and Cyrus > documents are all immutable). The xapian engine searches across the > full set of databases listed in xapianactive in order to find > document ids, then maps them through the conversations.db file to > find the actual emails. A copy/move of an email updates the > conversations.db lookups, so the next search will find the new > location without anything changing in xapian. > > the cyrus.indexed.db file is just a convenience to allow rolling > squatter to avoid having to re-scan records that it knows are > already indexed. > > Bron. > >> Regards >> ????? >> >> ----- Message from Bron Gondwana --------- >> Date: Mon, 20 May 2019 18:52:07 +1000 >> From: Bron Gondwana >> Subject: Re: Prepending Xapian Tiers >> To: Cyrus Devel >> >> >> > On Fri, May 17, 2019, at 23:52, ????? ???????? wrote: >> >> Hello, >> >> >> >> I set up a Cyrus system with one tier. I think it works. The >> >> .xapianactive files contain 'tiername: 0'. >> >> >> >> How can I insert a second tier? >> > >> > I have never tried this on a live server! Clearly the right thing to >> > do is to build a cassandane search which implements doing this so >> > that we can make sure it works. >> > >> >> Adding a XYZsearchpartition-default to imapd.conf, together with >> >> defaultsearchtier: XYZ does not utilize the new directory: it stays >> >> empty and the .xapianactive files do not get updated to mention the >> >> new tier. >> > >> > That looks like it should work. I assume you have restarted your >> > cyrus since making the change? I'm not certain that a rolling >> > squatter will discover a new config in the way that imapd does. >> > >> > Also - you'll need to run squatter in compact mode in order to add a >> > new xapianactive entry. The simplest could be: >> > >> > squatter -z tiername -t tiername -o >> > >> > I believe that given your current setup, this will just copy the >> > entry from tiername:0 to tirename:1 and also create XYZ:0 in the >> > xapianactive file at the same time. >> > >> >> Besides, if a message is MOVEd over IMAP, is any optimization >> >> utilized, to avoid reindexing the message, but just change the >> >> address of the document? >> > >> > Yes, both XAPINDEXED mode where the GUID is read from xapian, and >> > CONVINDEXED mode where the GUID is looked up via user.conversations >> > and then mapped into the cyrus.indexed.db files in each xapian tier >> > allow Xapian to skip reindexing when a message is already indexed. >> > This works for both MOVE and for re-uploading of an identical >> > message file via IMAP. >> > >> > Cheers, >> > >> > Bron. >> > >> > -- >> > Bron Gondwana, CEO, FastMail Pty Ltd >> > brong at fastmailteam.com >> >> >> ----- End message from Bron Gondwana ----- >> >> >> > > -- > Bron Gondwana, CEO, FastMail Pty Ltd > brong at fastmailteam.com ----- End message from Bron Gondwana ----- From brong at fastmailteam.com Tue May 28 04:20:32 2019 From: brong at fastmailteam.com (Bron Gondwana) Date: Tue, 28 May 2019 18:20:32 +1000 Subject: Prepending Xapian Tiers In-Reply-To: <20190525121812.Horde.3MgWS2uzMBgxzYXe9Eqb5Bg@webmail.aegee.org> References: <64A4D697-25D0-4F6B-BFD1-A38102F38523@aegee.org> <20190521084055.Horde.0kj0EVtXbXYitwEoo7qjIps@webmail.aegee.org> <19545f54-4f17-413d-b1c1-9f09a40f2710@www.fastmail.com> <20190525121812.Horde.3MgWS2uzMBgxzYXe9Eqb5Bg@webmail.aegee.org> Message-ID: <677cc867-3608-46c5-970b-6fb71c7cded8@www.fastmail.com> On Sat, May 25, 2019, at 22:19, Dilyan Palauzov wrote: > Hello Bron, > > For me it is still not absolutely clear how things work with the > Xapian seach backend. > > Has search_batchsize any impact during compacting? Does this setting > say how many new messages have to arrive, before indexing them > together in Xapian? No, search_batchsize just means that when you're indexing a brand new mailbox with millions of emails, it will put that many emails at a time into a single transactional write to the search index. During compacting, this value is not used. > What are the use-cases to call "squatter -F" (In compact mode, filter > the resulting database to only include messages which are not expunged > in mailboxes with existing name/uidvalidity.) and "squatter -X" > (Reindex all messages before compacting. This mode reads all the > lists of messages indexed by the listed tiers, and re-indexes them > into a temporary database before compacting that into place)? -F is useful to run occasionally so that your search indexes don't grow forever. When emails are expunged, their matching terms aren't removed from the xapian indexes, so the database will be bigger than necessary and when you search for a term which is in deleted emails, it will cause extra IO and conversations DB lookups on the document id. > Why shall one keep index of deleted and expunged messages and how to > delete references from messages that are both expunged and expired > (after cyr_expire -X0, so removed from the hard disk), but keep the > index to messages that are still on the hard disk, but the user > expunged (double-deleted) them. I'm not sure I understand your question here. Deleting from xapian databases is slow, and particularly with the compacted form, it's designed to be efficient if you don't write to it. Finally, since we're de-duplicating by GUID, you would need to do a conversations db lookup for every deleted email to check the refcount before cleaning up the associated record. > How does re-compacting (as in > https://fastmail.blog/2014/12/01/email-search-system/) differ from > re-indexing (as in the manual page of master/squatter)? "re-compacting" - just means combining multiple databases together into a single compacted database - so the terms in all the source databases are compacted together into a destination database. I used "re-compacting" because the databases are already all compacted, so it's just combining them rather than gaining the initial space saving of the first compact. "re-indexing" involves parsing the email again and creating terms from the source document. When you "reindex" a set of xapian directories, the squatter reads the cyrus.indexed.db for each of the source directories to know which emails it claims to cover, and reads each of those emails in order to index them again. > What gets indexed? For a mailbox receiving only reports (dkim, dmarc, > mta-sts, arf, mtatls), some of which are archived (zip, gzip) the > Xapian index increases very fast. This would be because these emails often contain unique identifiers, which do indeed take a lot of space. We have had lots of debates over what exactly should be indexed - for example should you index sha1 values (e.g. git commit identifiers)? They're completely random, and hence all 40 characters need to be indexed each time! But - it's very handy to be able to search your email for a known identifier and see where it was referenced... so we decided to include them. We try not index GPG parts or other opaque blobs where nobody will be interested in searching for the phrase. Likewise we don't index MIME boundaries, because they're substructure, not something a user would know to search for. We have a work in progress on the master branch to index attachments using an external tool to extract text from the attachment where possible, which will increase index sizes even more if enabled! > How can I remove a tier, that contains no data, but is mentioned in > the .xapianactive files? If you run a compact which includes that tier as a source and not as a destination, then it should remove that tier from every .xapianactive file, at which point you can remove it from your imapd.conf. > How can I rename a tier? The whole point of tier names not being paths on disk is so you can change the disk path without having to rename the tier. Tier names are IDs, so you're not supposed to rename them. Having said that, you could add a new tier, compact everything across to that tier, then remove the old tier. > How can I efficiently prepend a new tear in the .xapianactive file? > ?squatter -t X -z Y -o? does add to the .xapianactive files the > defaultsearhtier, but first has to duplicate with rsync all existing > files. This is not efficient, as big files have to copied. I'm afraid that's what we have right now. Again, tiers are supposed to be set up at the start and not fiddled with afterwards, so the system isn't designed to allow you to quickly add a new tier. > > What it does under the hood is creates a new database and copy all > > the documents over from the source databases, then compress the end > > result into the most compact and fastest xapian format which is > > designed to never write again. This compressed file is then stored > > into the target database name, and in an exclusively locked > > operation the new database is moved into place and the old tiers are > > removed from the xapianactive, such that all new searches look into > > the single destination database instead of the multiple source > > databases. > > I do not get this. The amount of tiers to check does not reduce after > doing merging and with three tears the amount of databases is most of > the time three. Not if you're compacting frequently. We do the following: * hourly - check if tmpfs is > 50% full - quit if not. - run squatter -a -o -t temp -z data * daily - regardless of tmpfs size, compact everything on temp and meta down to data - squatter -a -t temp,meta -z data * weekly on Sunday - re-compact all data partitions together - squatter -a temp,meta,data -z data * And finally, once per week once the re-compact is done, check if we need to filter and recompact the archive, if so: - squatter -a data,archive -z archive -F Since today is Monday, most users will have two, so the xapianactive might be something like: temp:66 data:52 data:51 archive:2 Later in the week, it might be: temp:70 data:66 data:55 data:54 data:53 data:52 data:51 archive:2 And then maybe it will re-compact on Sunday and the user will have temp:74 archive:3 > What happens, if squatter is terminated during Xapian-compacting, > apart from leaving temporary files? Will rerunning it, just start > from beginning? The source databases will still be in xapian.active, so yes - a new compact run will take those same source databases and start again. > Is the idea to have three tiers like this: > > At run time, new messages are indexed by Xapian in squatter-rolling > mode on tmpfs/RAM, say on tear T1. That's certainly what we do, since indexing is too IO-intensive otherwise. > Regalarly, the RAM database is compacted to hard disk (tear T2), say > T1 and T2 are megred into T2. The database on the hard disk is > read-only and search in it is accelerated, as the database is ?compact?. As above - during the week we don't even merge T2 back together, we compact from T1 to a single small database on T2 - leading to multiple databases on T2 existing at once. > Only if two compactions happen in parallel of the same sources or > destination, the merge fails and is skipped for that user. The merge > is retried whenever merging T1 and T2 is rescheduled. Yes - though that's pretty rare on our systems because we use a lock around the cron task, so the only time this would happen is if you ran a manual compaction at the same time as the cron job. > As the databases in T2 get bigger, merging T1 and T2 takes more and > more time. So one more Xapian tear is created, T3. Less regularly, > T2 and T3 are merged into T3. This process takes a while. But > afterwards, T2 is again small, so merging T1 and T2 into T2 is fast. Yes, that's what we do. This is also the time that we filter the DB, so the T3 database only contains emails which were still alive at the time of compaction. > How many tears make sense, apart from having one more for power-off events? Having another one for power off events doesn't make heaps of sense unless you have a fast disk. That's kind of what our "meta" partition is, it's an SSD RAID1 that's faster than the "data" partition which is a SATA spinning RAID1 set. When we power off a server, we run a task to compact all the temp partitions down - it used to be to meta, but we found that compacting straight to data was plenty fast, so we just do that now! If you power off a server without copying the indexes off tmpfs, they are of course lost. This means that you need to run squatter -i on the server after reboot to index all the recent messages again! So we always run a squatter -i after a crash or power outage before bringing that server back into production. Cheers, Bron. -- Bron Gondwana, CEO, FastMail Pty Ltd brong at fastmailteam.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From Dilyan.Palauzov at aegee.org Tue May 28 16:38:48 2019 From: Dilyan.Palauzov at aegee.org (Dilyan Palauzov) Date: Tue, 28 May 2019 20:38:48 +0000 Subject: Prepending Xapian Tiers In-Reply-To: <677cc867-3608-46c5-970b-6fb71c7cded8@www.fastmail.com> References: <64A4D697-25D0-4F6B-BFD1-A38102F38523@aegee.org> <20190521084055.Horde.0kj0EVtXbXYitwEoo7qjIps@webmail.aegee.org> <19545f54-4f17-413d-b1c1-9f09a40f2710@www.fastmail.com> <20190525121812.Horde.3MgWS2uzMBgxzYXe9Eqb5Bg@webmail.aegee.org> <677cc867-3608-46c5-970b-6fb71c7cded8@www.fastmail.com> Message-ID: <20190528203848.Horde.PCqVYQTy-YIKf1pKHFOE1sg@webmail.aegee.org> Hello, so the .conversations database does, apart of the descriptions at https://www.cyrusimap.org/imap/concepts/deployment/databases.html#conversations-userid-conversations, also store per user a G record for each message, mapping the mailboxes where the message is located and the results from Xapian search return G records. Are a G record, GUID and a conversation ID the same thing? When a message is expunged, are its records from .conversations removed? When a message is unexpunged, is it again inserted in .conversations and referenced in the sync_log_channels: squatter? squatter has the modes: indexer, search, rolling, synclog, compact, indexfrom (deprecated) and audit. Is search_batchsize used only in the indexer mode, in particular it is not used when squatter -t ? -z -X is called (compact and reindex simultaneously)? What is the application for squatter -X (Reindex all messages before compacting. This mode reads all the lists of messages indexed by the listed tiers, and re-indexes them into a temporary database before compacting that into place)? Does it index messages, that were not indexed yet for any reason, or it deletes the database, scans each message again and creates a compact Xapian database? In the case I described, mailbox receiving reports, having an index grow very fast, the cause was a mail loop - a lot of emails arriving in short time. Once the loop stopped, the index does not exand faster than other mailboxes. So by default for now, unless some extra setup is performed, only words in text/plain and text/html get indexed, possibly with headers, and attachments are ignored? Regards ????? ----- Message from Bron Gondwana --------- Date: Tue, 28 May 2019 18:20:32 +1000 From: Bron Gondwana Subject: Re: Prepending Xapian Tiers To: Cyrus Devel > On Sat, May 25, 2019, at 22:19, Dilyan Palauzov wrote: >> Hello Bron, >> >> For me it is still not absolutely clear how things work with the >> Xapian seach backend. >> >> Has search_batchsize any impact during compacting? Does this setting >> say how many new messages have to arrive, before indexing them >> together in Xapian? > > No, search_batchsize just means that when you're indexing a brand > new mailbox with millions of emails, it will put that many emails at > a time into a single transactional write to the search index. During > compacting, this value is not used. > >> What are the use-cases to call "squatter -F" (In compact mode, filter >> the resulting database to only include messages which are not expunged >> in mailboxes with existing name/uidvalidity.) and "squatter -X" >> (Reindex all messages before compacting. This mode reads all the >> lists of messages indexed by the listed tiers, and re-indexes them >> into a temporary database before compacting that into place)? > > -F is useful to run occasionally so that your search indexes don't > grow forever. When emails are expunged, their matching terms aren't > removed from the xapian indexes, so the database will be bigger than > necessary and when you search for a term which is in deleted emails, > it will cause extra IO and conversations DB lookups on the document > id. > >> Why shall one keep index of deleted and expunged messages and how to >> delete references from messages that are both expunged and expired >> (after cyr_expire -X0, so removed from the hard disk), but keep the >> index to messages that are still on the hard disk, but the user >> expunged (double-deleted) them. > > I'm not sure I understand your question here. Deleting from xapian > databases is slow, and particularly with the compacted form, it's > designed to be efficient if you don't write to it. Finally, since > we're de-duplicating by GUID, you would need to do a conversations > db lookup for every deleted email to check the refcount before > cleaning up the associated record. > >> How does re-compacting (as in >> https://fastmail.blog/2014/12/01/email-search-system/) differ from >> re-indexing (as in the manual page of master/squatter)? > > "re-compacting" - just means combining multiple databases together > into a single compacted database - so the terms in all the source > databases are compacted together into a destination database. I used > "re-compacting" because the databases are already all compacted, so > it's just combining them rather than gaining the initial space > saving of the first compact. > > "re-indexing" involves parsing the email again and creating terms > from the source document. When you "reindex" a set of xapian > directories, the squatter reads the cyrus.indexed.db for each of the > source directories to know which emails it claims to cover, and > reads each of those emails in order to index them again. > >> What gets indexed? For a mailbox receiving only reports (dkim, dmarc, >> mta-sts, arf, mtatls), some of which are archived (zip, gzip) the >> Xapian index increases very fast. > > This would be because these emails often contain unique identifiers, > which do indeed take a lot of space. We have had lots of debates > over what exactly should be indexed - for example should you index > sha1 values (e.g. git commit identifiers)? They're completely > random, and hence all 40 characters need to be indexed each time! > But - it's very handy to be able to search your email for a known > identifier and see where it was referenced... so we decided to > include them. > > We try not index GPG parts or other opaque blobs where nobody will > be interested in searching for the phrase. Likewise we don't index > MIME boundaries, because they're substructure, not something a user > would know to search for. > > We have a work in progress on the master branch to index attachments > using an external tool to extract text from the attachment where > possible, which will increase index sizes even more if enabled! > >> How can I remove a tier, that contains no data, but is mentioned in >> the .xapianactive files? > > If you run a compact which includes that tier as a source and not as > a destination, then it should remove that tier from every > .xapianactive file, at which point you can remove it from your > imapd.conf. > >> How can I rename a tier? > > The whole point of tier names not being paths on disk is so you can > change the disk path without having to rename the tier. Tier names > are IDs, so you're not supposed to rename them. > > Having said that, you could add a new tier, compact everything > across to that tier, then remove the old tier. > >> How can I efficiently prepend a new tear in the .xapianactive file? >> ?squatter -t X -z Y -o? does add to the .xapianactive files the >> defaultsearhtier, but first has to duplicate with rsync all existing >> files. This is not efficient, as big files have to copied. > > I'm afraid that's what we have right now. Again, tiers are supposed > to be set up at the start and not fiddled with afterwards, so the > system isn't designed to allow you to quickly add a new tier. > >> > What it does under the hood is creates a new database and copy all >> > the documents over from the source databases, then compress the end >> > result into the most compact and fastest xapian format which is >> > designed to never write again. This compressed file is then stored >> > into the target database name, and in an exclusively locked >> > operation the new database is moved into place and the old tiers are >> > removed from the xapianactive, such that all new searches look into >> > the single destination database instead of the multiple source >> > databases. >> >> I do not get this. The amount of tiers to check does not reduce after >> doing merging and with three tears the amount of databases is most of >> the time three. > > Not if you're compacting frequently. We do the following: > > * hourly > - check if tmpfs is > 50% full - quit if not. > - run squatter -a -o -t temp -z data > * daily > - regardless of tmpfs size, compact everything on temp and meta down to data > - squatter -a -t temp,meta -z data > * weekly on Sunday - re-compact all data partitions together > - squatter -a temp,meta,data -z data > * And finally, once per week once the re-compact is done, check if > we need to filter and recompact the archive, if so: > - squatter -a data,archive -z archive -F > > Since today is Monday, most users will have two, so the xapianactive > might be something like: > temp:66 data:52 data:51 archive:2 > > Later in the week, it might be: > temp:70 data:66 data:55 data:54 data:53 data:52 data:51 archive:2 > > And then maybe it will re-compact on Sunday and the user will have > temp:74 archive:3 > >> What happens, if squatter is terminated during Xapian-compacting, >> apart from leaving temporary files? Will rerunning it, just start >> from beginning? > > The source databases will still be in xapian.active, so yes - a new > compact run will take those same source databases and start again. > >> Is the idea to have three tiers like this: >> >> At run time, new messages are indexed by Xapian in squatter-rolling >> mode on tmpfs/RAM, say on tear T1. > > That's certainly what we do, since indexing is too IO-intensive otherwise. > >> Regalarly, the RAM database is compacted to hard disk (tear T2), say >> T1 and T2 are megred into T2. The database on the hard disk is >> read-only and search in it is accelerated, as the database is ?compact?. > > As above - during the week we don't even merge T2 back together, we > compact from T1 to a single small database on T2 - leading to > multiple databases on T2 existing at once. > >> Only if two compactions happen in parallel of the same sources or >> destination, the merge fails and is skipped for that user. The merge >> is retried whenever merging T1 and T2 is rescheduled. > > Yes - though that's pretty rare on our systems because we use a lock > around the cron task, so the only time this would happen is if you > ran a manual compaction at the same time as the cron job. > >> As the databases in T2 get bigger, merging T1 and T2 takes more and >> more time. So one more Xapian tear is created, T3. Less regularly, >> T2 and T3 are merged into T3. This process takes a while. But >> afterwards, T2 is again small, so merging T1 and T2 into T2 is fast. > > Yes, that's what we do. This is also the time that we filter the DB, > so the T3 database only contains emails which were still alive at > the time of compaction. > >> How many tears make sense, apart from having one more for power-off events? > > Having another one for power off events doesn't make heaps of sense > unless you have a fast disk. That's kind of what our "meta" > partition is, it's an SSD RAID1 that's faster than the "data" > partition which is a SATA spinning RAID1 set. > > When we power off a server, we run a task to compact all the temp > partitions down - it used to be to meta, but we found that > compacting straight to data was plenty fast, so we just do that now! > > If you power off a server without copying the indexes off tmpfs, > they are of course lost. This means that you need to run squatter -i > on the server after reboot to index all the recent messages again! > So we always run a squatter -i after a crash or power outage before > bringing that server back into production. > > Cheers, > > Bron. > > -- > Bron Gondwana, CEO, FastMail Pty Ltd > brong at fastmailteam.com ----- End message from Bron Gondwana ----- From ellie at fastmail.com Wed May 29 03:04:05 2019 From: ellie at fastmail.com (ellie timoney) Date: Wed, 29 May 2019 17:04:05 +1000 Subject: =?UTF-8?Q?Re:_https://imapwiki.org/ImapTest/ServerStatus_update_for_Cyru?= =?UTF-8?Q?s_Imap_3.0?= In-Reply-To: <154f8a34-8e9b-4c6d-a2f5-94a3f90581c1@www.fastmail.com> References: <84f9bd10cb3742fd5831288bd9a61e6bbb4d6b18.camel@aegee.org> <154f8a34-8e9b-4c6d-a2f5-94a3f90581c1@www.fastmail.com> Message-ID: I've updated this with the results from the scripted tests (i.e. all the 0/n numbers, based on the src/tests/* stuff), but from my experimentation so far it hasn't yet become apparent exactly how to test the checkpoint/recent/etc columns and/or how to interpret the results, so I've left those columns as they were previously. -------------- next part -------------- An HTML attachment was scrubbed... URL: From brong at fastmailteam.com Wed May 29 17:54:25 2019 From: brong at fastmailteam.com (Bron Gondwana) Date: Thu, 30 May 2019 07:54:25 +1000 Subject: Prepending Xapian Tiers In-Reply-To: <20190528203848.Horde.PCqVYQTy-YIKf1pKHFOE1sg@webmail.aegee.org> References: <64A4D697-25D0-4F6B-BFD1-A38102F38523@aegee.org> <20190521084055.Horde.0kj0EVtXbXYitwEoo7qjIps@webmail.aegee.org> <19545f54-4f17-413d-b1c1-9f09a40f2710@www.fastmail.com> <20190525121812.Horde.3MgWS2uzMBgxzYXe9Eqb5Bg@webmail.aegee.org> <677cc867-3608-46c5-970b-6fb71c7cded8@www.fastmail.com> <20190528203848.Horde.PCqVYQTy-YIKf1pKHFOE1sg@webmail.aegee.org> Message-ID: <3e19da71-cff3-4909-964a-2e4c47e900cd@www.fastmail.com> On Wed, May 29, 2019, at 06:39, Dilyan Palauzov wrote: > Hello, > > so the .conversations database does, apart of the descriptions > at > https://www.cyrusimap.org/imap/concepts/deployment/databases.html#conversations-userid-conversations, also store per user a G record for each message, mapping the mailboxes where the message is located and the results from Xapian search return G > records. > > Are a G record, GUID and a conversation ID the same thing? G records are identical to GUIDs. There are also G records (in latest master at least) for sub parts of message, which map to a blobId in JMAP and allow direct addressing of every part by a content-based ID. conversation ID is something different, it's based on a permutation of the GUID of the first message that arrived within that thread - and was the original point of the conversations database. Sadly this has all evolved over time. I would like to migrate Cyrus towards using the terminology in JMAP, which has EmailId (which is a prefix on the GUID in JMAP) and ThreadId (which is the conversation ID from Cyrus with 'T' as a prefix). As well as MailboxId which was previously known in Cyrus as UniqueId on mailboxes. > When a message is expunged, are its records from > .conversations removed? They are removed when it is UNLINKED, which may be at the same time depending on your expunge_mode setting. > When a message is unexpunged, is it again inserted in > .conversations and referenced in the sync_log_channels: > squatter? Yes, unexpunge is treated as a new APPEND, and since the bytes are the same, the GUID will be the same. > squatter has the modes: indexer, search, rolling, synclog, compact, > indexfrom (deprecated) and audit. Is search_batchsize used only in the > indexer mode, in particular it is not used when squatter -t ? -z -X is > called (compact and reindex simultaneously)? Hmm.... let me check! Nope, when you run with -X it reindexes all the messages in an entire mailbox in a single batch, ignoring search_batchsize. > What is the application for squatter -X (Reindex all messages before > compacting. This mode reads all the lists of messages indexed by the > listed tiers, and re-indexes them into a temporary database before > compacting that into place)? It is very useful when index formats have changed over time and you want to reindex all emails with the latest format, or when you believe a search database might be corrupted and want to rebuild it from source. > Does it index messages, that were not indexed yet for any reason, or > it deletes the database, scans each message again and creates a > compact Xapian database? It uses the cyrus.indexed.db of each of the source databases (selected by -t) to know which range of UIDs in each mailbox were claimed to be indexed by those databases, and then scans over those same ranges of UIDs again and indexes the contents of those messages if they are not yet expunged. > In the case I described, mailbox receiving reports, having an index > grow very fast, the cause was a mail loop - a lot of emails arriving > in short time. Once the loop stopped, the index does not exand faster > than other mailboxes. That makes sense. > So by default for now, unless some extra setup is performed, only > words in text/plain and text/html get indexed, possibly with headers, > and attachments are ignored? Yes, that's is correct. In fact, it's all text types. text/calendar and text/vcard are processed specially. Other text/* types are treated the same as text/plain for indexing purposes. Bron. -- Bron Gondwana, CEO, FastMail Pty Ltd brong at fastmailteam.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From Dilyan.Palauzov at aegee.org Fri May 31 14:33:43 2019 From: Dilyan.Palauzov at aegee.org (Dilyan Palauzov) Date: Fri, 31 May 2019 18:33:43 +0000 Subject: squatter -F increases the index size Message-ID: <20190531183343.Horde.NqDuNlWbxJXSqZCJ2rWmPGG@webmail.aegee.org> Hello, I gave squatter -F a try. Before I run it for a user tier T1 was not compacted and allocated 3,4 MB (mega), T2 was compacted and contained 3.7GB (giga). After removing the records of the deteled messages, say running squatter -F T2 was 5.7GB and squatter printed ?filtering? instead of ?compacting?. Then I run again ?squatter -t T1,T2 -z T2? without -F, without -X and squatter reindexed all messages, to create a 3.0 GB index. I expected, that using -F the resulting database will be compacted and on the second call there will be no reindexing. When does squatter decide on its own to reindex? What do G records in conversations.db contain? My reading is that the way to create a Xapian index of an indexed mailbox, is that first squatter has to be run in INDEX mode and then in COMPACT mode. In particular it is not possible to create in one step a compacted database. Does squatter -R -S sleep after each mailbox or after each message indexed? When compacting, squatter deals just with messages and on search or reindex the conversations.db is used to map the messages to mailboxes. How does squatter -S sleep after each mailbox during compacting, if it knows nothing about mailboxes? What does mean a tier name in a xapianactive file without a number? What are XAPIAN_DBW_CONVINDEXED and _XAPINDEXED? What does the file sync/squatter? squatter can print ?Xapian: truncating text from message mailbox user.... uid 7309?. When are messages truncated for the purposes of indexing? Do I understand correctly, that for a Xapianactive file with "A B C D E", to remove C one has to call "squatter -t C,D -z D". But A cannot be removed, if it the defaultsearchtier. Is the defaultsearchtier always included in the xapianactive file, if the tier is missing, whenever the file is modified (and the only way to modify it is to call squatter in COMPACT mode)? Regards ?????