Use of isspace() and friends in Cyrus

Bron Gondwana brong at fastmail.fm
Thu Mar 19 17:35:31 EDT 2009


On Thu, Mar 19, 2009 at 11:25:56AM +0000, David Carter wrote:
> I spent some time last night working out why the cyrus.cache entry for a  
> specific message differed on a master and replica system. An eight bit  
> character in a To: header (allowed by "reject8bit: no") had been  
> transformed into a space character in the cache on the replica.
>
> The root cause turned out to be some code in parseaddr_phrase() which
> replaces arbitrary sequences of "space" characters with a single space:
>
>   ./lib/parseaddr.c:
>
>   static int parseaddr_phrase(inp, phrasep, specials)
>   char **inp;
>   char **phrasep;
>   char *specials;
>   {
>     int c;
>     char *src = *inp;    /* XXX signed char on most platforms */
>
>     for (;;) {
>         c = *src++;      /* XXX signed char promoted to signed int */

Yeah, ouch!

> isspace(-10) isn't defined.
>
> Worse isspace() is normally implemented as an array lookup and can  
> segfault with a negative index if you are really unlucky.
>
> A quick search through the code reveals quite a lot of places where char  
> variables are passed to isspace() without a cast to (unsigned char). The  
> ctype macros often causes grief, and the normal workaround is to define  
> replacement macros of the form:
>
>   #define Uisspace(c) isspace((int)((unsigned char)(c)))
>
> I'm happy to knock up a sample patch if people agree.
>
> Curiously reconstruct on the replica fixed the cache entry generated by  
> sync_server, despite the fact both are just using message_parse_file().
> I infer that isspace(-10) is normally false but occasionally true, at  
> least on Solaris x86. It appears to always be false on Linux x86.

So anyway, "isspace" in this case is actually a unicode function
presumably.  From the charset fiddling I've been doing it would be
pretty easy to implement "charset_isspace" in a couple of lines of
code :)  But of course, that's not the issue.

Yeah - there are two choices there, Uisspace, or actually fix all the
code to pass unsigned explicitly to isspace.  I don't see any problem
with Uisspace.  There's already cyrus_isdigit that I wrote a while
back because isdigit() was taking an awful amount of processor for
something that really was /[0-9]/ in meaning.

So - where would you put the macro?

Ok - next for _my_ little proposal... ;)  It will make this look
like nothing!

Bron.


More information about the Cyrus-devel mailing list