squatter -F increases the index size

Dilyan Palauzov Dilyan.Palauzov at aegee.org
Mon Jun 3 14:15:43 EDT 2019


Hello Bron,

imap/squatter.c:do_compact() does call `if (sleepmicroseconds)  
usleep(sleepmicroseconds);` so -S number is honoured with `squatter  
-t… -z…`.

Will `squatter -F -z… -t…` be fixed on the stable branch, or shall  
calling `squatter -F -t… -z` be discouraged with 3.0?

Providing that currently after `squatter -F -t… -z…` calling `squatter  
-t… -z` does reindex all messages and therefore creates a new xapian  
index, it must be possible to to create a compacted database directly,  
without creating an bloated index first.

My understaning to the rolling mode is that once a new message  
appears/arrives/is APPENDed or deliver(ed), it is added to the sync  
log and then indexed in rolling mode.  Then arrives a message at a  
different place, it is added to the log and then indexed.  Whether the  
first and second messages are in the same mailbox is completely  
random.  Why does squatter not sleep, if the two messages are in the  
same mailbox and works non-stop otherwise, say why does it sleep  
depending on random circumstances?

https://wiki.dovecot.org/Plugins/FTS/Squat says for DoveCot that IMAP  
requires that SEARCH is done also on substings, no IMAP server  
implements this requirement, and dovecot does implement it only when  
Squat indices are used.  Is the same valid for Cyrus Imap (Squat index  
offers substring search, Xapian index does not offer substring search)?

Runnig squatter once printed “compressing X:0,X,Y:0 to Y:3 for …  
(active Y:0,X:0,X,Y:0,Y:1,Y:2)”  
(https://github.com/cyrusimap/cyrus-imapd/issues/2764) so I suspect a  
tiername without a number was in the .xapianactive file.

If I do any compact (-o, -F, -X, just -t -z), where the first tier is  
not referenced, does squatter ensure that the default tier according  
to imapd.conf is inserted in the xapianactive file.  Or asking in  
other ways, it I change imapd.conf and create a new tier T6 and  
declare T5 to be the default tier, which of the following will insert  
a reference to T5:0 in .xapianactive and which will not:

squatter -t T2 -o -z T2
squatter -t T5,T2 -z T2
squatter -t T5 -o T4
squatter -t T2 -F T3
sqautter -t T2 -X T3
or what else?  (The name T5 is declared, and the root directory exist,  
but neither there is data in the directory, nor is T5 yet in any  
.xapianactive file).

Regards
   Дилян
----- Message from Bron Gondwana <brong at fastmailteam.com> ---------
    Date: Tue, 04 Jun 2019 01:53:23 +1000
    From: Bron Gondwana <brong at fastmailteam.com>
Subject: Re: squatter -F increases the index size
      To: Cyrus Devel <cyrus-devel at lists.andrew.cmu.edu>


> On Sat, Jun 1, 2019, at 04:34, Dilyan Palauzov wrote:
>> Hello,
>>
>> I gave squatter -F a try.
>>
>> Before I run it for a user tier T1 was not compacted and allocated 3,4
>> MB (mega), T2 was compacted and contained 3.7GB (giga). After
>> removing the records of the deteled messages, say running squatter -F
>> T2 was 5.7GB and squatter printed “filtering” instead of “compacting”.
>>  Then I run again “squatter -t T1,T2 -z T2” without -F, without -X
>> and squatter reindexed all messages, to create a 3.0 GB index.
>>
>> I expected, that using -F the resulting database will be compacted and
>> on the second call there will be no reindexing.
>
> I discovered some bad bugs in -F recently, so I suspect that's why.  
> They should be fixed on master now.
>
>> When does squatter decide on its own to reindex?
>
> When the DB version is too old (which is one of the -F bugs - it  
> wasn't setting the DB version, so it assumed the data was all  
> version zero!)
>
>> What do G records in conversations.db contain?
>
> G records contain a mapping from GUID to folder number (offset into  
> the $FOLDER_NAMES key) and UID and optionally IMAP part number as  
> the key - mapping to values which contain some keywords and modseq  
> from the original record as well.
>
>> My reading is that the way to create a Xapian index of an indexed
>> mailbox, is that first squatter has to be run in INDEX mode and then
>> in COMPACT mode. In particular it is not possible to create in one
>> step a compacted database.
>
> No, it's not - due to the way to compact API works. At least, I  
> haven't figured out how.
>
>> Does squatter -R -S sleep after each mailbox or after each message indexed?
>
> It sleeps after each mailbox.
>
>> When compacting, squatter deals just with messages and on search or
>> reindex the conversations.db is used to map the messages to mailboxes.
>>  How does squatter -S sleep after each mailbox during compacting, if
>> it knows nothing about mailboxes?
>
> -S is not used when compacting.
>
>> What does mean a tier name in a xapianactive file without a number?
>
> that shouldn't happen. It will be parsed as the same as tier:0 I believe.
>
>> What are XAPIAN_DBW_CONVINDEXED and _XAPINDEXED?
>
> Two different ways to know if a document is indexed. CONVINDEXED  
> uses the conversations DB to look up mailbox and uid and then the  
> cyrus.indexed.db databases to see if the message has already been  
> seen.
>
> XAPINDEXED uses the metadata inside the Xapian databases to know if  
> a particular message has been indexed based on the cyrusid.*G*  
> metadata values which are identical to the GUIDs themselves.
>
>> What does the file sync/squatter?
>
> It's a sync/$channel directory which squatter watches on. This is a  
> method for providing a queue of mailboxes to look at based on the  
> APPEND sync_log statements.
>
>> squatter can print “Xapian: truncating text from message mailbox
>> user.... uid 7309”. When are messages truncated for the purposes of
>> indexing?
>
> When they are too long! The comment in the source code says this:
>
> /* Maximum size of a query, determined empirically, is a little bit
> * under 8MB. That seems like more than enough, so let's limit the
> * total amount of parts text to 4 MB. */
> #define MAX_PARTS_SIZE (4*1024*1024)
>
> This is a holdover from when Greg was working on it. We could switch  
> this to be a configurable option.
>
>> Do I understand correctly, that for a Xapianactive file with "A B C D
>> E", to remove C one has to call "squatter -t C,D -z D". But A cannot
>> be removed, if it the defaultsearchtier. Is the defaultsearchtier
>> always included in the xapianactive file, if the tier is missing,
>> whenever the file is modified (and the only way to modify it is to
>> call squatter in COMPACT mode)?
>
> When you do any compact, if it includes the first item (the writable  
> database) then a new writable database will be created on the  
> default tier. So if you try to compact the default tier away, a new  
> default tier item will be created.
>
> Bron.
>
> --
>  Bron Gondwana, CEO, FastMail Pty Ltd
>  brong at fastmailteam.com


----- End message from Bron Gondwana <brong at fastmailteam.com> -----




More information about the Cyrus-devel mailing list