Future Ideas wiki page

Bron Gondwana brong at fastmail.fm
Fri Jan 8 20:05:20 EST 2010



On Fri, 08 Jan 2010 09:56 -0800, "David Lang" <david.lang at digitalinsight.com> wrote:
> one thing that I saw mentioned elsewhere as a limitation of IMAP (and
> therefor I 
> don't know if there is a way to address it reasonably) is the lack of a
> fuzzy 
> search capability.

Without a specification document, it's hard to add anything that you expect
clients to actually use.

> the IMAP search is a exact match search, it would be useful to have the
> hooks to 
> be able to use a search-engine like search capibility as well (not just
> exact 
> matches, but matches with only some of the search terms, matches with
> plural 
> versions of the search terms, etc)

Yes, that would be lovely to have.  You'd probably run a separate search-engine
process and have the IMAP server just send out a request and map the document
IDs back to folder/uid on response.

> As I understand it this would require a slight variation of the search
> request 
> to indicate that you want the fuzzy match, and a variation of the search 
> response to be able to indicate the quality of each match returned.

It would require a brand new spec for the search result - an ordered list of
UIDs wouldn't cut it any more! 

While we're at it, I'm much more interested in cross-folder searching with sort
order that doesn't require folder as the first item, but that's significantly more
complex!

Thankfully, this is all pretty orthagonal to everything that I'm doing, so it's not
a consideration I need to give much thought to at the moment.  Someone else
who considers it worth putting effort in to could do it pretty independently.

The charset changes would allow an initial pre-processing pass to spit out the
"document" as UTF-8 rather than its original MIME encoding for processing by
the search engine, but that's the only interaction it would have.  If the search
engine supports a chunked input, it would probably be worth embedding that
target into the lib/charset.c as a character filter sink, and chaining the documents
into it rather than building an entire buffer at once.  There's already code that
does that just using a standard buffer and sending it to the squatter callback
whenever it reaches a fixed size, then resetting it.  Easy enough to do.

Regards,

Bron.
-- 
  Bron Gondwana
  brong at fastmail.fm



More information about the Cyrus-devel mailing list