How to use non-ascii charsets with sieve?

Lawrence Greenfield leg+ at andrew.cmu.edu
Tue Dec 10 14:52:29 EST 2002


   Date: Tue, 10 Dec 2002 19:07:55 +0900 (JST)
   From: Mark Keasling <mark at air.co.jp>
[...]
   I'm in the process of trying to figure out how this stuff works...
   Is it possible to separate the charset to utf-8 conversion from the text to
   search data transformation?

It would be technically possible. It's probably not the easiest thing
to do in the Cyrus code base.

Currently mkchartable.c does casemapping, character decomposition, and
whitespace elimination. It also applies some mappings
(charset/unifix.txt) that help with a language independant match but
may not be appropriate for collation or all UTF-8 comparators.

To make the chartable stuff work for Sieve & our current SEARCH, we
probably should build tables that just output decomposed (or fully
composed) UTF-8 characters.

We can then write a UTF-8 comparator library that, during comparison,
does the canonicalization.

The easier path to make Sieve work would be to just build two
completely seperate tables. I'd prefer to see the more general
solution.

While none of this is rocket science, it is heavily detailed oriented
and requires concentration.

Larry






More information about the Info-cyrus mailing list