squatter -F increases the index size
Дилян Палаузов
dilyan.palauzov at aegee.org
Wed Jul 17 04:49:15 EDT 2019
Hello,
for more than a month it is known, that on the stable branch, “squatter -F” does increase the index size, but not on the
instable branch.
Will the fix be backported, or shall “sqatter -F” be deleted from the documentation on the stable branch?
Regards
Дилян
On Tue, 2019-06-04 at 01:53 +1000, Bron Gondwana wrote:
> On Sat, Jun 1, 2019, at 04:34, Dilyan Palauzov wrote:
> > Hello,
> >
> > I gave squatter -F a try.
> >
> > Before I run it for a user tier T1 was not compacted and allocated 3,4
> > MB (mega), T2 was compacted and contained 3.7GB (giga). After
> > removing the records of the deteled messages, say running squatter -F
> > T2 was 5.7GB and squatter printed “filtering” instead of “compacting”.
> > Then I run again “squatter -t T1,T2 -z T2” without -F, without -X
> > and squatter reindexed all messages, to create a 3.0 GB index.
> >
> > I expected, that using -F the resulting database will be compacted and
> > on the second call there will be no reindexing.
>
> I discovered some bad bugs in -F recently, so I suspect that's why. They should be fixed on master now.
>
> > When does squatter decide on its own to reindex?
>
> When the DB version is too old (which is one of the -F bugs - it wasn't setting the DB version, so it assumed the data was all version zero!)
>
> > What do G records in conversations.db contain?
>
> G records contain a mapping from GUID to folder number (offset into the $FOLDER_NAMES key) and UID and optionally IMAP part number as the key - mapping to values which contain some keywords and modseq from the original record as well.
>
> > My reading is that the way to create a Xapian index of an indexed
> > mailbox, is that first squatter has to be run in INDEX mode and then
> > in COMPACT mode. In particular it is not possible to create in one
> > step a compacted database.
>
> No, it's not - due to the way to compact API works. At least, I haven't figured out how.
>
> > Does squatter -R -S sleep after each mailbox or after each message indexed?
>
> It sleeps after each mailbox.
>
> > When compacting, squatter deals just with messages and on search or
> > reindex the conversations.db is used to map the messages to mailboxes.
> > How does squatter -S sleep after each mailbox during compacting, if
> > it knows nothing about mailboxes?
>
> -S is not used when compacting.
>
> > What does mean a tier name in a xapianactive file without a number?
>
> that shouldn't happen. It will be parsed as the same as tier:0 I believe.
>
> > What are XAPIAN_DBW_CONVINDEXED and _XAPINDEXED?
>
> Two different ways to know if a document is indexed. CONVINDEXED uses the conversations DB to look up mailbox and uid and then the cyrus.indexed.db databases to see if the message has already been seen.
>
> XAPINDEXED uses the metadata inside the Xapian databases to know if a particular message has been indexed based on the cyrusid.*G* metadata values which are identical to the GUIDs themselves.
>
> > What does the file sync/squatter?
>
> It's a sync/$channel directory which squatter watches on. This is a method for providing a queue of mailboxes to look at based on the APPEND sync_log statements.
>
> > squatter can print “Xapian: truncating text from message mailbox
> > user.... uid 7309”. When are messages truncated for the purposes of
> > indexing?
>
> When they are too long! The comment in the source code says this:
>
> /* Maximum size of a query, determined empirically, is a little bit
> * under 8MB. That seems like more than enough, so let's limit the
> * total amount of parts text to 4 MB. */
> #define MAX_PARTS_SIZE (4*1024*1024)
>
> This is a holdover from when Greg was working on it. We could switch this to be a configurable option.
>
> > Do I understand correctly, that for a Xapianactive file with "A B C D
> > E", to remove C one has to call "squatter -t C,D -z D". But A cannot
> > be removed, if it the defaultsearchtier. Is the defaultsearchtier
> > always included in the xapianactive file, if the tier is missing,
> > whenever the file is modified (and the only way to modify it is to
> > call squatter in COMPACT mode)?
>
> When you do any compact, if it includes the first item (the writable database) then a new writable database will be created on the default tier. So if you try to compact the default tier away, a new default tier item will be created.
>
> Bron.
>
> --
> Bron Gondwana, CEO, FastMail Pty Ltd
> brong at fastmailteam.com
>
>
More information about the Cyrus-devel
mailing list