How to filter based on "garbage" subjects ... ?
Cyrus Daboo
daboo at cyrusoft.com
Tue Sep 30 12:06:52 EDT 2003
Hi Marc,
--On Tuesday, September 30, 2003 11:32 -0300 "Marc G. Fournier"
<scrappy at hub.org> wrote:
|
| I've yet to be able to come up with a sieve rule that will allow me to
| filter all "garbage" subjects to a separate folder ... you know the ones
| that look like:
|
| Subject: =?euc-kr?q?(=B1=A4=B0=ED)=B5=F0=C1=F6=
|
| I've even tried to use Pine filtering to filter based on 8bit subjects,
| but it doesn't pick them up either ...
|
| For instance, under Pine, if I try to select all subjects with =B1= in
| them, which the above contains, it selects nothing, so I'm figuring there
| has to be some control characters in there somewhere ... ?
|
| Thoughts?
|
>From the SIEVE RFC:
| Implementations decode header charsets to UTF-8. Two strings are
| considered equal if their UTF-8 representations are identical.
| Implementations should decode charsets represented in the forms
| specified by [MIME] for both message headers and bodies.
| Implementations must be capable of decoding US-ASCII, ISO-8859-1,
| the ASCII subset of ISO-8859-* character sets, and UTF-8.
i.e. SIEVE should be decoding the =?euc-kr?.... header into its utf8 form
BEFORE doing the comparison with the text you provide. i.e. the =B1
quoted-printable encoded character will have been decoded into the utf8
representation of that for the euc-kr character set, and thus won't match
the text you provide. Actually the euc-ky character set is a multibyte
character set so in fact the unicode character is made up of =B1 and =A4.
By my reckoning that is the unicode character 0xad11 - I'll leave you to
work out the utf8 encoding of that!
Basically you are going to have a hard time trying to filter on arbitrary
unicode characters in some random character set given that sieve expects
utf8 in its scripts.
--
Cyrus Daboo
More information about the Info-cyrus
mailing list