How to make cyrus not change non US-ASCII characters to "X"

Mark Keasling mark at air.co.jp
Thu Nov 21 22:20:21 EST 2002


Hi,

On Thu, 21 Nov 2002 16:36:31 -0500, Lawrence Greenfield <leg+ at andrew.cmu.edu> wrote...
> --On Thursday, November 21, 2002 2:03 PM -0200 Alessandro Oliveira 
> <alessandro.o at nunoferreira.com.br> wrote:
> 
> > Maybe the best solution would be a filter to change the message body
> > encoding using quoted-printable instead of putting X everywhere, and
> > change headers to also use the appropriate encoding. What do you think
> > about this ?
> 
> The characters can't be encoded because we don't know what the character 
> set it. The only patch that would be acceptable to us would have a 
> configurable character set ("assume_8bit: iso-8859-1") and would sanity 
> check and encode the untagged 8-bit according to that.
> 
> Larry
> 

Are the messages containing 8 bit characters in MIME format?

Do message body 8 bit characters get converted to 'X' even when a charset
containing 8 bit characters (like iso-8859-1) has been specified for the
body part?

Cyrus' changing unspecified 8 bit data to 'X' is while infuriating a
perfectly reasonable behavior given the requirements that it is not
to emit such data.

MTAs are supposed to be permitted to change the content-transfer-encoding
particularly if it is 8BIT or BINARY to something it considers suitable
(or safe) since after decoding it you stll have the original data.  But
that requires that the message is already MIME.

The messages containing unlabled 8 bit data most likely aren't MIME
because a MIME compliant client is not supposed to send unlabeled 8 bit
data.  A message containing unlabeled 8 bit data is unspecified, non-standard
(ie. broken).  But that's life.  Anyway to change the content-transfer-encoding
the message would have to be MIME-ified first; otherwise, a client would
not know that it needs to undo the quoted-printable encoding.  If the client
didn't undo the quoted-printable you have the same situation as the characters
were converted to 'X' only quoted-printable is much uglier.  It should
be possible to mime-ify such a message by doing something like:
----------
<original headers with 8bit bytes converted to 'X'>
mime-version: 1.0
content-type: text/plain
content-transfer-encoding: quoted-printable

<quoted-printable encoded (body or original message)>
----or----
<original headers with 8bit bytes converted to 'X'>
mime-version: 1.0
content-type: multipart/mixed; boundary="mimeboundary"

--mimeboundary
content-type: text/plain; charset="iso-8895-1"
content-transfer-encoding: quoted-printable

<quoted-printable encoded body>
--mimeboundary
content-type: application/octet-stream
content-transfer-encoding: quoted-printable
content-description: The original message text before MIME-ifcation.

<original message data>
--mimeboundary--
----------
However, fixing broken messages isn't really the calling of cyrus...
This is something that should probably happen before it gets to cyrus.
But would it be acceptable for lmtpd to check the message for 8 bit and
when it finds characters that it would normally convert to 'X', pass that
message through an "unlabled 8 bit filter" before trying to process it
further?  The "unlabeled 8 bit filter" could do nothing, fix-up the message,
reject it or whatever.

Regards,
Mark Keasling <mark at air.co.jp>





More information about the Info-cyrus mailing list