Truncated text during Xapian indexing

Sebastian Hagedorn Hagedorn at uni-koeln.de
Tue Feb 20 05:08:42 EST 2018


Thanks for your reply, that was very interesting and helpful!

--On 15. Februar 2018 um 16:12:23 +0100 Robert Stepanek 
<rsto at fastmailteam.com> wrote:

>> Just out of curiosity, how is the mapping between a Xapian docid and a
>> message file on disk achieved? I played around with xapian-delve and the
>> Perl example simplesearch.pl. When I search a term, I get a list of
>> docid's, but how do I know which message that is?
>
> In 3.x, Cyrus search stores an internal unique message id, called guid,
> as docid in Xapian. The guid currently is a SHA-1 hash of the raw
> message, allowing for deduplication and to avoid re-indexing already seen
> messages. The conversations.db of a user maps this guid to a list of
> mailbox:UID pairs.
>
> Off the top of my head, there currently isn't an "official" way in Cyrus
> to retrieve the mailbox:UID list for a given guid outside the Cyrus
> process. Depending on your use case, you could either: 1.) build your
> custom mapper on imap/conversations.h, 2.) use cvt_cyrusdb to dump the
> contents of a conversations.db into plain text.

FWIW, that conversion is so "lossy" as to be useless. But it was really 
only curiosity, so it doesn't matter.

Cheers,
Sebastian
-- 
    .:.Sebastian Hagedorn - Weyertal 121 (Gebäude 133), Zimmer 2.02.:.
                 .:.Regionales Rechenzentrum (RRZK).:.
   .:.Universität zu Köln / Cologne University - ✆ +49-221-470-89578.:.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 191 bytes
Desc: not available
URL: <http://lists.andrew.cmu.edu/pipermail/info-cyrus/attachments/20180220/e2c8002a/attachment.sig>


More information about the Info-cyrus mailing list