squatter -F increases the index size

Bron Gondwana brong at fastmailteam.com
Fri Jun 7 15:11:39 EDT 2019


I saw your ticket about that - I'll check it out soon. Sorry, been busy at the calconnect conference in the UK this week.

I also did this:

https://github.com/cyrusimap/cyrus-imapd/commit/27513a9bc3f217f388bac163820f9879178071fb

Which I believe will mean that if you change the defaultsearchtier, it will immediately start indexing to the new location. You'll definitely want to restart a server over changing that config option though, and not have it be different between different invocations of squatter, or you'll wind up creating a lot of xapainactive entries!

Bron.

On Tue, Jun 4, 2019, at 04:19, Dilyan Palauzov wrote:
> Hello Bron,
> 
> imap/squatter.c:do_compact() does call `if (sleepmicroseconds) 
> usleep(sleepmicroseconds);` so -S number is honoured with `squatter 
> -t… -z…`.
> 
> Will `squatter -F -z… -t…` be fixed on the stable branch, or shall 
> calling `squatter -F -t… -z` be discouraged with 3.0?
> 
> Providing that currently after `squatter -F -t… -z…` calling `squatter 
> -t… -z` does reindex all messages and therefore creates a new xapian 
> index, it must be possible to to create a compacted database directly, 
> without creating an bloated index first.
> 
> My understaning to the rolling mode is that once a new message 
> appears/arrives/is APPENDed or deliver(ed), it is added to the sync 
> log and then indexed in rolling mode. Then arrives a message at a 
> different place, it is added to the log and then indexed. Whether the 
> first and second messages are in the same mailbox is completely 
> random. Why does squatter not sleep, if the two messages are in the 
> same mailbox and works non-stop otherwise, say why does it sleep 
> depending on random circumstances?
> 
> https://wiki.dovecot.org/Plugins/FTS/Squat says for DoveCot that IMAP 
> requires that SEARCH is done also on substings, no IMAP server 
> implements this requirement, and dovecot does implement it only when 
> Squat indices are used. Is the same valid for Cyrus Imap (Squat index 
> offers substring search, Xapian index does not offer substring search)?
> 
> Runnig squatter once printed “compressing X:0,X,Y:0 to Y:3 for … 
> (active Y:0,X:0,X,Y:0,Y:1,Y:2)” 
> (https://github.com/cyrusimap/cyrus-imapd/issues/2764) so I suspect a 
> tiername without a number was in the .xapianactive file.
> 
> If I do any compact (-o, -F, -X, just -t -z), where the first tier is 
> not referenced, does squatter ensure that the default tier according 
> to imapd.conf is inserted in the xapianactive file. Or asking in 
> other ways, it I change imapd.conf and create a new tier T6 and 
> declare T5 to be the default tier, which of the following will insert 
> a reference to T5:0 in .xapianactive and which will not:
> 
> squatter -t T2 -o -z T2
> squatter -t T5,T2 -z T2
> squatter -t T5 -o T4
> squatter -t T2 -F T3
> sqautter -t T2 -X T3
> or what else? (The name T5 is declared, and the root directory exist, 
> but neither there is data in the directory, nor is T5 yet in any 
> .xapianactive file).
> 
> Regards
>  Дилян
> ----- Message from Bron Gondwana <brong at fastmailteam.com> ---------
>  Date: Tue, 04 Jun 2019 01:53:23 +1000
>  From: Bron Gondwana <brong at fastmailteam.com>
> Subject: Re: squatter -F increases the index size
>  To: Cyrus Devel <cyrus-devel at lists.andrew.cmu.edu>
> 
> 
> > On Sat, Jun 1, 2019, at 04:34, Dilyan Palauzov wrote:
> >> Hello,
> >>
> >> I gave squatter -F a try.
> >>
> >> Before I run it for a user tier T1 was not compacted and allocated 3,4
> >> MB (mega), T2 was compacted and contained 3.7GB (giga). After
> >> removing the records of the deteled messages, say running squatter -F
> >> T2 was 5.7GB and squatter printed “filtering” instead of “compacting”.
> >> Then I run again “squatter -t T1,T2 -z T2” without -F, without -X
> >> and squatter reindexed all messages, to create a 3.0 GB index.
> >>
> >> I expected, that using -F the resulting database will be compacted and
> >> on the second call there will be no reindexing.
> >
> > I discovered some bad bugs in -F recently, so I suspect that's why. 
> > They should be fixed on master now.
> >
> >> When does squatter decide on its own to reindex?
> >
> > When the DB version is too old (which is one of the -F bugs - it 
> > wasn't setting the DB version, so it assumed the data was all 
> > version zero!)
> >
> >> What do G records in conversations.db contain?
> >
> > G records contain a mapping from GUID to folder number (offset into 
> > the $FOLDER_NAMES key) and UID and optionally IMAP part number as 
> > the key - mapping to values which contain some keywords and modseq 
> > from the original record as well.
> >
> >> My reading is that the way to create a Xapian index of an indexed
> >> mailbox, is that first squatter has to be run in INDEX mode and then
> >> in COMPACT mode. In particular it is not possible to create in one
> >> step a compacted database.
> >
> > No, it's not - due to the way to compact API works. At least, I 
> > haven't figured out how.
> >
> >> Does squatter -R -S sleep after each mailbox or after each message indexed?
> >
> > It sleeps after each mailbox.
> >
> >> When compacting, squatter deals just with messages and on search or
> >> reindex the conversations.db is used to map the messages to mailboxes.
> >> How does squatter -S sleep after each mailbox during compacting, if
> >> it knows nothing about mailboxes?
> >
> > -S is not used when compacting.
> >
> >> What does mean a tier name in a xapianactive file without a number?
> >
> > that shouldn't happen. It will be parsed as the same as tier:0 I believe.
> >
> >> What are XAPIAN_DBW_CONVINDEXED and _XAPINDEXED?
> >
> > Two different ways to know if a document is indexed. CONVINDEXED 
> > uses the conversations DB to look up mailbox and uid and then the 
> > cyrus.indexed.db databases to see if the message has already been 
> > seen.
> >
> > XAPINDEXED uses the metadata inside the Xapian databases to know if 
> > a particular message has been indexed based on the cyrusid.*G* 
> > metadata values which are identical to the GUIDs themselves.
> >
> >> What does the file sync/squatter?
> >
> > It's a sync/$channel directory which squatter watches on. This is a 
> > method for providing a queue of mailboxes to look at based on the 
> > APPEND sync_log statements.
> >
> >> squatter can print “Xapian: truncating text from message mailbox
> >> user.... uid 7309”. When are messages truncated for the purposes of
> >> indexing?
> >
> > When they are too long! The comment in the source code says this:
> >
> > /* Maximum size of a query, determined empirically, is a little bit
> > * under 8MB. That seems like more than enough, so let's limit the
> > * total amount of parts text to 4 MB. */
> > #define MAX_PARTS_SIZE (4*1024*1024)
> >
> > This is a holdover from when Greg was working on it. We could switch 
> > this to be a configurable option.
> >
> >> Do I understand correctly, that for a Xapianactive file with "A B C D
> >> E", to remove C one has to call "squatter -t C,D -z D". But A cannot
> >> be removed, if it the defaultsearchtier. Is the defaultsearchtier
> >> always included in the xapianactive file, if the tier is missing,
> >> whenever the file is modified (and the only way to modify it is to
> >> call squatter in COMPACT mode)?
> >
> > When you do any compact, if it includes the first item (the writable 
> > database) then a new writable database will be created on the 
> > default tier. So if you try to compact the default tier away, a new 
> > default tier item will be created.
> >
> > Bron.
> >
> > --
> > Bron Gondwana, CEO, FastMail Pty Ltd
> > brong at fastmailteam.com
> 
> 
> ----- End message from Bron Gondwana <brong at fastmailteam.com> -----
> 
> 
> 

--
 Bron Gondwana, CEO, FastMail Pty Ltd
 brong at fastmailteam.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.andrew.cmu.edu/pipermail/cyrus-devel/attachments/20190608/df9e2ea1/attachment-0001.html>


More information about the Cyrus-devel mailing list