Truncated text during Xapian indexing
Sebastian Hagedorn
Hagedorn at uni-koeln.de
Tue Feb 20 05:08:42 EST 2018
Thanks for your reply, that was very interesting and helpful!
--On 15. Februar 2018 um 16:12:23 +0100 Robert Stepanek
<rsto at fastmailteam.com> wrote:
>> Just out of curiosity, how is the mapping between a Xapian docid and a
>> message file on disk achieved? I played around with xapian-delve and the
>> Perl example simplesearch.pl. When I search a term, I get a list of
>> docid's, but how do I know which message that is?
>
> In 3.x, Cyrus search stores an internal unique message id, called guid,
> as docid in Xapian. The guid currently is a SHA-1 hash of the raw
> message, allowing for deduplication and to avoid re-indexing already seen
> messages. The conversations.db of a user maps this guid to a list of
> mailbox:UID pairs.
>
> Off the top of my head, there currently isn't an "official" way in Cyrus
> to retrieve the mailbox:UID list for a given guid outside the Cyrus
> process. Depending on your use case, you could either: 1.) build your
> custom mapper on imap/conversations.h, 2.) use cvt_cyrusdb to dump the
> contents of a conversations.db into plain text.
FWIW, that conversion is so "lossy" as to be useless. But it was really
only curiosity, so it doesn't matter.
Cheers,
Sebastian
--
.:.Sebastian Hagedorn - Weyertal 121 (Gebäude 133), Zimmer 2.02.:.
.:.Regionales Rechenzentrum (RRZK).:.
.:.Universität zu Köln / Cologne University - ✆ +49-221-470-89578.:.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 191 bytes
Desc: not available
URL: <http://lists.andrew.cmu.edu/pipermail/info-cyrus/attachments/20180220/e2c8002a/attachment.sig>
More information about the Info-cyrus
mailing list