Use of isspace() and friends in Cyrus
Bron Gondwana
brong at fastmail.fm
Thu Mar 19 17:35:31 EDT 2009
On Thu, Mar 19, 2009 at 11:25:56AM +0000, David Carter wrote:
> I spent some time last night working out why the cyrus.cache entry for a
> specific message differed on a master and replica system. An eight bit
> character in a To: header (allowed by "reject8bit: no") had been
> transformed into a space character in the cache on the replica.
>
> The root cause turned out to be some code in parseaddr_phrase() which
> replaces arbitrary sequences of "space" characters with a single space:
>
> ./lib/parseaddr.c:
>
> static int parseaddr_phrase(inp, phrasep, specials)
> char **inp;
> char **phrasep;
> char *specials;
> {
> int c;
> char *src = *inp; /* XXX signed char on most platforms */
>
> for (;;) {
> c = *src++; /* XXX signed char promoted to signed int */
Yeah, ouch!
> isspace(-10) isn't defined.
>
> Worse isspace() is normally implemented as an array lookup and can
> segfault with a negative index if you are really unlucky.
>
> A quick search through the code reveals quite a lot of places where char
> variables are passed to isspace() without a cast to (unsigned char). The
> ctype macros often causes grief, and the normal workaround is to define
> replacement macros of the form:
>
> #define Uisspace(c) isspace((int)((unsigned char)(c)))
>
> I'm happy to knock up a sample patch if people agree.
>
> Curiously reconstruct on the replica fixed the cache entry generated by
> sync_server, despite the fact both are just using message_parse_file().
> I infer that isspace(-10) is normally false but occasionally true, at
> least on Solaris x86. It appears to always be false on Linux x86.
So anyway, "isspace" in this case is actually a unicode function
presumably. From the charset fiddling I've been doing it would be
pretty easy to implement "charset_isspace" in a couple of lines of
code :) But of course, that's not the issue.
Yeah - there are two choices there, Uisspace, or actually fix all the
code to pass unsigned explicitly to isspace. I don't see any problem
with Uisspace. There's already cyrus_isdigit that I wrote a while
back because isdigit() was taking an awful amount of processor for
something that really was /[0-9]/ in meaning.
So - where would you put the macro?
Ok - next for _my_ little proposal... ;) It will make this look
like nothing!
Bron.
More information about the Cyrus-devel
mailing list