Prepending Xapian Tiers

Bron Gondwana brong at fastmailteam.com
Wed May 29 17:54:25 EDT 2019



On Wed, May 29, 2019, at 06:39, Dilyan Palauzov wrote:
> Hello,
> 
> so the <userid>.conversations database does, apart of the descriptions 
> at 
> https://www.cyrusimap.org/imap/concepts/deployment/databases.html#conversations-userid-conversations, also store per user a G record for each message, mapping the mailboxes where the message is located and the results from Xapian search return G 
> records.
> 
> Are a G record, GUID and a conversation ID the same thing?

G records are identical to GUIDs. There are also G records (in latest master at least) for sub parts of message, which map to a blobId in JMAP and allow direct addressing of every part by a content-based ID.

conversation ID is something different, it's based on a permutation of the GUID of the first message that arrived within that thread - and was the original point of the conversations database.

Sadly this has all evolved over time. I would like to migrate Cyrus towards using the terminology in JMAP, which has EmailId (which is a prefix on the GUID in JMAP) and ThreadId (which is the conversation ID from Cyrus with 'T' as a prefix). As well as MailboxId which was previously known in Cyrus as UniqueId on mailboxes.

> When a message is expunged, are its records from 
> <userid>.conversations removed?

They are removed when it is UNLINKED, which may be at the same time depending on your expunge_mode setting.

> When a message is unexpunged, is it again inserted in 
> <userid>.conversations and referenced in the sync_log_channels: 
> squatter?

Yes, unexpunge is treated as a new APPEND, and since the bytes are the same, the GUID will be the same.

> squatter has the modes: indexer, search, rolling, synclog, compact, 
> indexfrom (deprecated) and audit. Is search_batchsize used only in the 
> indexer mode, in particular it is not used when squatter -t … -z -X is 
> called (compact and reindex simultaneously)?

Hmm.... let me check! Nope, when you run with -X it reindexes all the messages in an entire mailbox in a single batch, ignoring search_batchsize.

> What is the application for squatter -X (Reindex all messages before 
> compacting. This mode reads all the lists of messages indexed by the 
> listed tiers, and re-indexes them into a temporary database before 
> compacting that into place)?

It is very useful when index formats have changed over time and you want to reindex all emails with the latest format, or when you believe a search database might be corrupted and want to rebuild it from source.

> Does it index messages, that were not indexed yet for any reason, or 
> it deletes the database, scans each message again and creates a 
> compact Xapian database?

It uses the cyrus.indexed.db of each of the source databases (selected by -t) to know which range of UIDs in each mailbox were claimed to be indexed by those databases, and then scans over those same ranges of UIDs again and indexes the contents of those messages if they are not yet expunged.

> In the case I described, mailbox receiving reports, having an index 
> grow very fast, the cause was a mail loop - a lot of emails arriving 
> in short time. Once the loop stopped, the index does not exand faster 
> than other mailboxes.

That makes sense.

> So by default for now, unless some extra setup is performed, only 
> words in text/plain and text/html get indexed, possibly with headers, 
> and attachments are ignored?

Yes, that's is correct. In fact, it's all text types. text/calendar and text/vcard are processed specially. Other text/* types are treated the same as text/plain for indexing purposes.

Bron.
--
 Bron Gondwana, CEO, FastMail Pty Ltd
 brong at fastmailteam.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.andrew.cmu.edu/pipermail/cyrus-devel/attachments/20190530/18fac40c/attachment.html>


More information about the Cyrus-devel mailing list