squatter -F increases the index size
brong at fastmailteam.com
Fri Jun 7 15:11:39 EDT 2019
I saw your ticket about that - I'll check it out soon. Sorry, been busy at the calconnect conference in the UK this week.
I also did this:
Which I believe will mean that if you change the defaultsearchtier, it will immediately start indexing to the new location. You'll definitely want to restart a server over changing that config option though, and not have it be different between different invocations of squatter, or you'll wind up creating a lot of xapainactive entries!
On Tue, Jun 4, 2019, at 04:19, Dilyan Palauzov wrote:
> Hello Bron,
> imap/squatter.c:do_compact() does call `if (sleepmicroseconds)
> usleep(sleepmicroseconds);` so -S number is honoured with `squatter
> -t… -z…`.
> Will `squatter -F -z… -t…` be fixed on the stable branch, or shall
> calling `squatter -F -t… -z` be discouraged with 3.0?
> Providing that currently after `squatter -F -t… -z…` calling `squatter
> -t… -z` does reindex all messages and therefore creates a new xapian
> index, it must be possible to to create a compacted database directly,
> without creating an bloated index first.
> My understaning to the rolling mode is that once a new message
> appears/arrives/is APPENDed or deliver(ed), it is added to the sync
> log and then indexed in rolling mode. Then arrives a message at a
> different place, it is added to the log and then indexed. Whether the
> first and second messages are in the same mailbox is completely
> random. Why does squatter not sleep, if the two messages are in the
> same mailbox and works non-stop otherwise, say why does it sleep
> depending on random circumstances?
> https://wiki.dovecot.org/Plugins/FTS/Squat says for DoveCot that IMAP
> requires that SEARCH is done also on substings, no IMAP server
> implements this requirement, and dovecot does implement it only when
> Squat indices are used. Is the same valid for Cyrus Imap (Squat index
> offers substring search, Xapian index does not offer substring search)?
> Runnig squatter once printed “compressing X:0,X,Y:0 to Y:3 for …
> (active Y:0,X:0,X,Y:0,Y:1,Y:2)”
> (https://github.com/cyrusimap/cyrus-imapd/issues/2764) so I suspect a
> tiername without a number was in the .xapianactive file.
> If I do any compact (-o, -F, -X, just -t -z), where the first tier is
> not referenced, does squatter ensure that the default tier according
> to imapd.conf is inserted in the xapianactive file. Or asking in
> other ways, it I change imapd.conf and create a new tier T6 and
> declare T5 to be the default tier, which of the following will insert
> a reference to T5:0 in .xapianactive and which will not:
> squatter -t T2 -o -z T2
> squatter -t T5,T2 -z T2
> squatter -t T5 -o T4
> squatter -t T2 -F T3
> sqautter -t T2 -X T3
> or what else? (The name T5 is declared, and the root directory exist,
> but neither there is data in the directory, nor is T5 yet in any
> .xapianactive file).
> ----- Message from Bron Gondwana <brong at fastmailteam.com> ---------
> Date: Tue, 04 Jun 2019 01:53:23 +1000
> From: Bron Gondwana <brong at fastmailteam.com>
> Subject: Re: squatter -F increases the index size
> To: Cyrus Devel <cyrus-devel at lists.andrew.cmu.edu>
> > On Sat, Jun 1, 2019, at 04:34, Dilyan Palauzov wrote:
> >> Hello,
> >> I gave squatter -F a try.
> >> Before I run it for a user tier T1 was not compacted and allocated 3,4
> >> MB (mega), T2 was compacted and contained 3.7GB (giga). After
> >> removing the records of the deteled messages, say running squatter -F
> >> T2 was 5.7GB and squatter printed “filtering” instead of “compacting”.
> >> Then I run again “squatter -t T1,T2 -z T2” without -F, without -X
> >> and squatter reindexed all messages, to create a 3.0 GB index.
> >> I expected, that using -F the resulting database will be compacted and
> >> on the second call there will be no reindexing.
> > I discovered some bad bugs in -F recently, so I suspect that's why.
> > They should be fixed on master now.
> >> When does squatter decide on its own to reindex?
> > When the DB version is too old (which is one of the -F bugs - it
> > wasn't setting the DB version, so it assumed the data was all
> > version zero!)
> >> What do G records in conversations.db contain?
> > G records contain a mapping from GUID to folder number (offset into
> > the $FOLDER_NAMES key) and UID and optionally IMAP part number as
> > the key - mapping to values which contain some keywords and modseq
> > from the original record as well.
> >> My reading is that the way to create a Xapian index of an indexed
> >> mailbox, is that first squatter has to be run in INDEX mode and then
> >> in COMPACT mode. In particular it is not possible to create in one
> >> step a compacted database.
> > No, it's not - due to the way to compact API works. At least, I
> > haven't figured out how.
> >> Does squatter -R -S sleep after each mailbox or after each message indexed?
> > It sleeps after each mailbox.
> >> When compacting, squatter deals just with messages and on search or
> >> reindex the conversations.db is used to map the messages to mailboxes.
> >> How does squatter -S sleep after each mailbox during compacting, if
> >> it knows nothing about mailboxes?
> > -S is not used when compacting.
> >> What does mean a tier name in a xapianactive file without a number?
> > that shouldn't happen. It will be parsed as the same as tier:0 I believe.
> >> What are XAPIAN_DBW_CONVINDEXED and _XAPINDEXED?
> > Two different ways to know if a document is indexed. CONVINDEXED
> > uses the conversations DB to look up mailbox and uid and then the
> > cyrus.indexed.db databases to see if the message has already been
> > seen.
> > XAPINDEXED uses the metadata inside the Xapian databases to know if
> > a particular message has been indexed based on the cyrusid.*G*
> > metadata values which are identical to the GUIDs themselves.
> >> What does the file sync/squatter?
> > It's a sync/$channel directory which squatter watches on. This is a
> > method for providing a queue of mailboxes to look at based on the
> > APPEND sync_log statements.
> >> squatter can print “Xapian: truncating text from message mailbox
> >> user.... uid 7309”. When are messages truncated for the purposes of
> >> indexing?
> > When they are too long! The comment in the source code says this:
> > /* Maximum size of a query, determined empirically, is a little bit
> > * under 8MB. That seems like more than enough, so let's limit the
> > * total amount of parts text to 4 MB. */
> > #define MAX_PARTS_SIZE (4*1024*1024)
> > This is a holdover from when Greg was working on it. We could switch
> > this to be a configurable option.
> >> Do I understand correctly, that for a Xapianactive file with "A B C D
> >> E", to remove C one has to call "squatter -t C,D -z D". But A cannot
> >> be removed, if it the defaultsearchtier. Is the defaultsearchtier
> >> always included in the xapianactive file, if the tier is missing,
> >> whenever the file is modified (and the only way to modify it is to
> >> call squatter in COMPACT mode)?
> > When you do any compact, if it includes the first item (the writable
> > database) then a new writable database will be created on the
> > default tier. So if you try to compact the default tier away, a new
> > default tier item will be created.
> > Bron.
> > --
> > Bron Gondwana, CEO, FastMail Pty Ltd
> > brong at fastmailteam.com
> ----- End message from Bron Gondwana <brong at fastmailteam.com> -----
Bron Gondwana, CEO, FastMail Pty Ltd
brong at fastmailteam.com
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Cyrus-devel