Searching on RFC2047 headers
Ross Boylan
ross at biostat.ucsf.edu
Sun Nov 6 01:42:29 EST 2011
On Sun, 2011-11-06 at 10:19 +0900, OBATA Akio wrote:
> Hi,
>
> On Sun, 06 Nov 2011 05:26:15 +0900, Ross Boylan <ross at biostat.ucsf.edu> wrote:
>
> > I'm running Cyrus IMAP 2.2.13 on Debian, and am looking for away to
> > search on headers that use RFC2047. I am not able to retrieve anything.
> > Here's what happens (using imtest)
> > a06 uid fetch 152840 (BODY[HEADER.FIELDS (subject)])
> > * 60894 EXISTS
> > * 1 RECENT
> > * 59873 FETCH (UID 152840 BODY[HEADER.FIELDS (subject)] {72}
> > Subject: =?Windows-1251?B?z+7r8/fo8vwgMTEg6vPw8e7iIOHl8e/r4PLt7i4=?=
> >
> > )
> > a07 uid search (HEADER SUBJECT "=?Windows-1251?B?z+7r8/fo8vwgMTEg6vPw8e7iIOHl8e/r4PLt7i4=?=")
> > * SEARCH
> > a07 OK Completed (0 msgs in 0.140 secs)
> > a07 uid search charset us-ascii (HEADER SUBJECT "=?Windows-1251?B?z+7r8/fo8vwgMTEg6vPw8e7iIOHl8e/r4PLt7i4=?=")
> > * SEARCH
> > a07 OK Completed (0 msgs in 0.140 secs)
> > a07 uid search charset windows-1251 (HEADER SUBJECT "=?Windows-1251?B?z+7r8/fo8vwgMTEg6vPw8e7iIOHl8e/r4PLt7i4=?=")
> > a07 NO Unrecognized character set
> > a07 uid search charset us-ascii (HEADER SUBJECT "z+7r8/fo8vwgMTEg6vPw8e7iIOHl8e/r4PLt7i4")
> > * SEARCH
> > a07 OK Completed (0 msgs in 0.150 secs)
> > a07 uid search charset utf-8 (HEADER SUBJECT "\xcf\xee\xeb\xf3\xf7\xe8\xf2\xfc 11 \xea\xf3\xf0\xf1\xee\xe2 \xe1\xe5\xf1\xef\xeb\xe0\xf2\xed\xee.")
> > * SEARCH
> > a07 OK Completed (0 msgs in 0.160 secs)
> > a07 uid search (HEADER SUBJECT "\xcf\xee\xeb\xf3\xf7\xe8\xf2\xfc 11 \xea\xf3\xf0\xf1\xee\xe2 \xe1\xe5\xf1\xef\xeb\xe0\xf2\xed\xee.")
> > * SEARCH
> > a07 OK Completed (0 msgs in 0.130 secs)
> > a08 uid search (HEADER SUBJECT {63}
> > + go ahead
> > =?Windows-1251?B?z+7r8/fo8vwgMTEg6vPw8e7iIOHl8e/r4PLt7i4=?=
> >
> > )
> > * SEARCH
> > a08 OK Completed (0 msgs in 0.150 secs)
> >
> > The docs for Cyrus 2.2 do not refer to rfc2047, and so I would expect
> > the string to be treated as plain text and for any of my initial
> > searches to work. Obviously they don't.
Although the docs don't refer to rfc 2047, the source code does. However
it appears that windows-1251 is unknown to this version of Cyrus, and
that the canonical string is then empty. If so, it is simply not
possible to match this field.
Specifically, charset.c has
/*
* Decode MIME strings (per RFC 2047) in 's'. It writes the decoded
* string to 'retval', calling realloc() as needed. (Thus retval may
* be NULL.) Returns retval, contining 's' in canonical searching form.
*/
char *charset_decode_mimeheader(const char *s, char *retval, int
alloced).
The changelog mentions RFC2047 in changes since 1.5.11
But windows-1251 does not appear to be a defined character set
(windows-1252 is!). I'm not sure what happens in this case, but it
looks as if the canonical string may be empty. This would explain why
the searches fail. At any rate, I can't even match if I search for the
" 11 " only. I am also having trouble with syntax when I specify a
charset:
a03 search charset utf-8 uid 152830:152850 header subject "11"
a03 BAD Invalid Search criteria
a03 search charset utf-8 uid 152830:152850 header subject {4}
a03 BAD Invalid Search criteria
a03 search uid 152830:152850 header subject {4}
+ go ahead
11
* SEARCH 59872 59874 59875
# dropping charset eliminates error
# below we get the UIDs
a03 OK Completed (3 msgs in 0.010 secs)
a03 search charset utf-8 (uid 152830:152850 header subject {4}
a03 BAD Invalid Search criteria
a04 uid search uid 152830:152850 header subject {4}
+ go ahead
11
* SEARCH 152836 152843 152847
# unfortunately, that does not include 152840, the message I started with
a04 OK Completed (3 msgs in 0.010 secs)
a05 uid fetch 152840 (BODY[HEADER] (subject))
a05 BAD Invalid UID Fetch attribute
a05 uid fetch 152840 (BODY[HEADER.FIELDS] (subject))
a05 BAD Invalid body section
a05 uid fetch 152840 (BODY[HEADER.FIELDS (subject)])
* 59873 FETCH (UID 152840 BODY[HEADER.FIELDS (subject)] {72}
Subject: =?Windows-1251?B?z+7r8/fo8vwgMTEg6vPw8e7iIOHl8e/r4PLt7i4=?=
)
a05 OK Completed (0.000 sec)
Given the non-existence of windows-1251 (for cyrus 2.2), attempting the
utf-8 search is probably pointless anyway. Without knowledge of
windows-1251 it seems unlikely cyrus could convert the header to Unicode
or anything else.
>
> You should refer RFC 3501 section 6.4.4 for search syntax.
> Searched text must be specified by bare text (not MIME encoded one), and its CHARSET is specified.
> It is not required that CHASET is same as charset used for MIME encode.
>
> > I've seen references to cyrus canonicalizing search strings in later
> > versions
> > (http://www.cyrusimap.org/docs/cyrus-imapd/2.4.9/internal/internationalization.php) I'm not sure what the implications of that for this problem are.
>
> I feel that 2.2.13 just does not support windows-1251 yet.
That's consistent with the source code I examined.
> (I can find it in both 2.3.18 and 2.4.12)
>
More information about the Info-cyrus
mailing list