How to use non-ascii charsets with sieve?

Mon Dec 9 21:52:24 EST 2002

Hi Larry,

We are considering a modification like this to fill_cache(message_data_t *)
in cyrus-imapd-2.1.11/sieve/test.c

----%<--------%<----SNIP---->%-------->%----
void fill_cache(message_data_t *m)
{
    rewind(m->data);

    /* let's fill that header cache */
    for (;;) {
	char *name, *body;
	int cl, clinit;

	if (parseheader(m->data, &name, &body) < 0) {
	    break;
	}

#ifdef DECODE_SUBJECT
	/* decode mime encoded subjects */
	if( name && * name && ! strcmp( name, "subject" )
	    && body && * body && strstr( body, "=?" ))
	{
	    char * de = charset_decode1522( body, NULL, 0 ) ;
	    if( decoded && * decoded )
	    {
		free( body ) ;
		body = decoded ;
	    }
	}
#endif /* DECODE_SUBJECT */

----%<--------%<----SNIP---->%-------->%----

This hasn't been tested this yet since I stuck it in yesterday before
going home and have just returned to the office.  It should decode subjects
into utf8.  But it may have "interesting" unintended side-effects.  So far
we are only interested in decoded subjects.  But decoding the comment part
of addresses also has a high probability of being desired.  Depends on the
feed-back we get from users.

Will charset_decode1522( ) strip the whitespace?
Someone else found the function and I have only given it the most cursory
glance over.

On Mon, 9 Dec 2002 15:59:38 -0500, Lawrence Greenfield <leg+ at andrew.cmu.edu> wrote...
> You bring up good questions.
> 
> First, our Sieve implementation currently doesn't deal with RFC 2047
> encoded headers---or rather, it just compares the undecoded headers
> against the UTF-8 string. This is obviously a bug which sadly isn't in
> bugzilla.
> 
> Ken and I talked (a long time ago) about this. The main issue is that
> Cyrus's character comparison routines remove whitespace and always
> perform casemapping, and this is probably inappropriate for Sieve's
> use. Fixing this is probably not difficult, but I'd prefer not to have
> multiple different canonicalization tables.
> 
> The "fileinto" problem is more straightforward and should be fixed in
> lmtpd.c:sieve_fileinto().
> 
> I would add a function to mboxname.[ch] of mboxname_utf8tomutf7() and
> then make sieve_fileinto() call it.
> 
> Larry

Thank You!  This is very NICE as I hadn't gotten far enough along to look
at this yet.  The obvious work around (ugly hack) for fileinto is to have
the client do the mutf-7 conversion before submitting the script.  We're
working on the client so such a hack isn't out of the question; but probably
wont work well if some other client were to access the server.

> 
>    Date: Mon, 9 Dec 2002 19:53:37 +0900 (JST)
>    From: Mark Keasling <mark at air.co.jp>
> [...]
>    <script language="sieve" version="RFC-3028">
>      # pretend this is encoded in UTF-8
> 
>      require ["reject","fileinto"];
> 
>      if header :contains "Subject" "セミナー報告"
>      {
>        fileinto "セミナー報告" ;
>      }
>    </script>
> 
>    I don't know how the make timsieved decode mime headers or
>    MUTF-7 encode mailbox names.

Regards,
Mark Keasling <mark at air.co.jp>