8-bit characters in headers
Henrique de Moraes Holschuh
hmh at debian.org
Thu Apr 28 21:09:22 EDT 2005
On Fri, 29 Apr 2005, Adrian Buciuman wrote:
> Why not encode the header using unknown-8bit as the charset?? This is
> simpler, and has many advantages:
It still would cause problems for IMAP search. Well, mostly, as one could
just not index (and thus not search) any unknown-8bit headers at all...
> different valid word/name. (Silently converting to "X" is regarding as
> unethical by many network administrators. It can cause pain to
I could claim that tolerating non-RFC compliant headers is unethical as
well. This is a matter of local policy.
> If you can not do good, at least do not cause harm.
That would mean rejecting the message, which you can tell Cyrus to do AFAIK.
Allowing 8-bit chars in unchecked hoses Cyrus functionality silently.
> Rejecting mail be also be unethical (important messages will not be
> delivered because of a 8-bit character) and can also cause problems.
That is not unethical. Those messages are not RFC2822-compliant (in fact, I
believe they are not even RFC 822 compliant, but I did not check), so no
RFC-2822 system has to deal with them.
Whether you want to, or can afford to, reject such messages has nothing to
do with ethics.
> 3. unknown-8bit is registered by IANA
> It is used by some mail programs. See
Hmm... well, if Cyrus can be taught to skip unknown-8bit headers, that would
be a fine option to use instead of 'X', it looks like.
I'd have to read the rfc's again to know for sure whether they have anything
against unknown-8bit in headers, but I don't think they do.
> text/plain; charset=unknown-8bit" ( I believe sendmail may generate
> this, following RFC 1428. How will Cyrus search in such a message,
> BTW?). They seem happy to accept it in RFC-2047 header, possible as an
> 6. This will fix this Cyrus problem for ever. (By contrast, no
> heuristic can achieve this: it will need to be adapted, patched,
> improved and still it will not be perfect, it may find the wrong
> charset and cause confusions for users. The closer the heuristic is to
> the user, the better will work. So MUAs should guess the charset)
> 7. The same code can be used to implement a site-default. (Replace
> unknown-8bit with what you want and __do__ some plausibility checks)
> 8. It will work at any point in the mail path. Cyrus will do this, but
> MTAs and news servers can also convert headers to unknown-8bit. Or
> reverse it, if necessary.
Hmm... in fact, it should be easy to teach postfix to do such conversion, I
might try my hand at it just for the kick of finally letting no non-RFC2822
crap out of my MTAs.
> 4. Care should be taken not to rfc2047-encode text which must be
> ASCII. Even when properly encoded, non-ASCII is not valid anywhere in
Huh? I don't understand. ASCII is codepoints 0x00-0x7f. Nothing with an
8th bit set could be ASCII. Also, header names certainly cannot be
rfc2047-encoded, but the while point of encoding header *content* with
rfc2047 is to allow *ANY* valid codepoint [subject to the usual no control
caracters, etc] in the specified charset for content...
BTW: unknown-8bit with only codepoints 0x00-0x7f needs NOT be ASCII. It
might have come from an EBCDIC (yuck!) system, for all we know...
Now, if you mean that anything with illegal header names need to be
rejected, and that nothing should be encoded as anything other than
non-rfc2047-escaped ASCII when it is ASCII in the first place, then I agree
with you. BUT doing such encoding should not break anything. It would
just waste resources.
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
More information about the Info-cyrus