squatter -F increases the index size

Дилян Палаузов dilyan.palauzov at aegee.org
Wed Jul 17 04:49:15 EDT 2019


Hello,

for more than a month it is known, that on the stable branch, “squatter -F” does increase the index size, but not on the
instable branch.

Will the fix be backported, or shall “sqatter -F” be deleted from the documentation on the stable branch?

Regards
  Дилян

On Tue, 2019-06-04 at 01:53 +1000, Bron Gondwana wrote:
> On Sat, Jun 1, 2019, at 04:34, Dilyan Palauzov wrote:
> > Hello,
> > 
> > I gave squatter -F a try.
> > 
> > Before I run it for a user tier T1 was not compacted and allocated 3,4  
> > MB (mega), T2 was compacted and contained 3.7GB (giga).  After  
> > removing the records of the deteled messages, say running squatter -F  
> > T2 was 5.7GB and squatter printed “filtering” instead of “compacting”.  
> >   Then I run again “squatter -t T1,T2 -z T2” without -F, without -X  
> > and squatter reindexed all messages, to create a 3.0 GB index.
> > 
> > I expected, that using -F the resulting database will be compacted and  
> > on the second call there will be no reindexing.
> 
> I discovered some bad bugs in -F recently, so I suspect that's why.  They should be fixed on master now.
> 
> > When does squatter decide on its own to reindex?
> 
> When the DB version is too old (which is one of the -F bugs - it wasn't setting the DB version, so it assumed the data was all version zero!)
> 
> > What do G records in conversations.db contain?
> 
> G records contain a mapping from GUID to folder number (offset into the $FOLDER_NAMES key) and UID and optionally IMAP part number as the key - mapping to values which contain some keywords and modseq from the original record as well.
> 
> > My reading is that the way to create a Xapian index of an indexed  
> > mailbox, is that first squatter has to be run in INDEX mode and then  
> > in COMPACT mode.  In particular it is not possible to create in one  
> > step a compacted database.
> 
> No, it's not - due to the way to compact API works.  At least, I haven't figured out how.
> 
> > Does squatter -R -S sleep after each mailbox or after each message indexed?
> 
> It sleeps after each mailbox.
> 
> > When compacting, squatter deals just with messages and on search or  
> > reindex the conversations.db is used to map the messages to mailboxes.  
> >   How does squatter -S sleep after each mailbox during compacting, if  
> > it knows nothing about mailboxes?
> 
> -S is not used when compacting.
> 
> > What does mean a tier name in a xapianactive file without a number?
> 
> that shouldn't happen.  It will be parsed as the same as tier:0 I believe.
> 
> > What are XAPIAN_DBW_CONVINDEXED and _XAPINDEXED?
> 
> Two different ways to know if a document is indexed.  CONVINDEXED uses the conversations DB to look up mailbox and uid and then the cyrus.indexed.db databases to see if the message has already been seen.
> 
> XAPINDEXED uses the metadata inside the Xapian databases to know if a particular message has been indexed based on the cyrusid.*G* metadata values which are identical to the GUIDs themselves.
> 
> > What does the file sync/squatter?
> 
> It's a sync/$channel directory which squatter watches on.  This is a method for providing a queue of mailboxes to look at based on the APPEND sync_log statements.
> 
> > squatter can print “Xapian: truncating text from message mailbox  
> > user.... uid 7309”.  When are messages truncated for the purposes of  
> > indexing?
> 
> When they are too long!  The comment in the source code says this:
> 
> /* Maximum size of a query, determined empirically, is a little bit
> * under 8MB.  That seems like more than enough, so let's limit the
> * total amount of parts text to 4 MB. */
> #define MAX_PARTS_SIZE      (4*1024*1024)
> 
> This is a holdover from when Greg was working on it.  We could switch this to be a configurable option.
> 
> > Do I understand correctly, that for a Xapianactive file with "A B C D  
> > E", to remove C one has to call "squatter -t C,D -z D".  But A cannot  
> > be removed, if it the defaultsearchtier.  Is the defaultsearchtier  
> > always included in the xapianactive file, if the tier is missing,  
> > whenever the file is modified (and the only way to modify it is to  
> > call squatter in COMPACT mode)?
> 
> When you do any compact, if it includes the first item (the writable database) then a new writable database will be created on the default tier.  So if you try to compact the default tier away, a new default tier item will be created.
> 
> Bron.
> 
> --
>   Bron Gondwana, CEO, FastMail Pty Ltd
>   brong at fastmailteam.com
> 
> 



More information about the Cyrus-devel mailing list