squatter -F increases the index size

Bron Gondwana brong at fastmailteam.com
Mon Jun 3 11:53:23 EDT 2019


On Sat, Jun 1, 2019, at 04:34, Dilyan Palauzov wrote:
> Hello,
> 
> I gave squatter -F a try.
> 
> Before I run it for a user tier T1 was not compacted and allocated 3,4 
> MB (mega), T2 was compacted and contained 3.7GB (giga). After 
> removing the records of the deteled messages, say running squatter -F 
> T2 was 5.7GB and squatter printed “filtering” instead of “compacting”. 
>  Then I run again “squatter -t T1,T2 -z T2” without -F, without -X 
> and squatter reindexed all messages, to create a 3.0 GB index.
> 
> I expected, that using -F the resulting database will be compacted and 
> on the second call there will be no reindexing.

I discovered some bad bugs in -F recently, so I suspect that's why. They should be fixed on master now.

> When does squatter decide on its own to reindex?

When the DB version is too old (which is one of the -F bugs - it wasn't setting the DB version, so it assumed the data was all version zero!)

> What do G records in conversations.db contain?

G records contain a mapping from GUID to folder number (offset into the $FOLDER_NAMES key) and UID and optionally IMAP part number as the key - mapping to values which contain some keywords and modseq from the original record as well.

> My reading is that the way to create a Xapian index of an indexed 
> mailbox, is that first squatter has to be run in INDEX mode and then 
> in COMPACT mode. In particular it is not possible to create in one 
> step a compacted database.

No, it's not - due to the way to compact API works. At least, I haven't figured out how.

> Does squatter -R -S sleep after each mailbox or after each message indexed?

It sleeps after each mailbox.

> When compacting, squatter deals just with messages and on search or 
> reindex the conversations.db is used to map the messages to mailboxes. 
>  How does squatter -S sleep after each mailbox during compacting, if 
> it knows nothing about mailboxes?

-S is not used when compacting.

> What does mean a tier name in a xapianactive file without a number?

that shouldn't happen. It will be parsed as the same as tier:0 I believe.

> What are XAPIAN_DBW_CONVINDEXED and _XAPINDEXED?

Two different ways to know if a document is indexed. CONVINDEXED uses the conversations DB to look up mailbox and uid and then the cyrus.indexed.db databases to see if the message has already been seen.

XAPINDEXED uses the metadata inside the Xapian databases to know if a particular message has been indexed based on the cyrusid.*G* metadata values which are identical to the GUIDs themselves.

> What does the file sync/squatter?

It's a sync/$channel directory which squatter watches on. This is a method for providing a queue of mailboxes to look at based on the APPEND sync_log statements.

> squatter can print “Xapian: truncating text from message mailbox 
> user.... uid 7309”. When are messages truncated for the purposes of 
> indexing?

When they are too long! The comment in the source code says this:

/* Maximum size of a query, determined empirically, is a little bit
* under 8MB. That seems like more than enough, so let's limit the
* total amount of parts text to 4 MB. */
#define MAX_PARTS_SIZE (4*1024*1024)

This is a holdover from when Greg was working on it. We could switch this to be a configurable option.

> Do I understand correctly, that for a Xapianactive file with "A B C D 
> E", to remove C one has to call "squatter -t C,D -z D". But A cannot 
> be removed, if it the defaultsearchtier. Is the defaultsearchtier 
> always included in the xapianactive file, if the tier is missing, 
> whenever the file is modified (and the only way to modify it is to 
> call squatter in COMPACT mode)?

When you do any compact, if it includes the first item (the writable database) then a new writable database will be created on the default tier. So if you try to compact the default tier away, a new default tier item will be created.

Bron.

--
 Bron Gondwana, CEO, FastMail Pty Ltd
 brong at fastmailteam.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.andrew.cmu.edu/pipermail/cyrus-devel/attachments/20190604/b98d54b6/attachment.html>


More information about the Cyrus-devel mailing list