Man, you've got to love edge cases

Bron Gondwana brong at
Sat Jan 31 09:47:37 EST 2009

(it's 1:38 am, yikes)

So I've just been trying for hours to recreate this really rare crash we
get.  It only occurs with messages that are delivered to users with a
sieve script that has a regex match on the body, so we get one every few

We got one today.

It's been causing a segfault every hour or so as postfix tries to
deliver it again, and I've been trying to recreate it.  Got my test
machine almost exactly identical, no luck.

Finally I realised I had a brand new slot on the same machine, which
had no users in it yet.  I could test in there.  Tried, still nothing.

Diffed the file on disk after delivery against the failed files in the
staging directory of the other slot - the _ONLY_ difference was that the
connections came from a different machine.  I copied my LMTP session
code that ran the delivery over there and tried.  Still delivered fine.
Diff.  The only difference was that the "LHLO" name hadn't been changed.

I changed that.  Segfault!  Wow.

So I got suspicious.  Did some numbers.

[brong at imap11 ~]$ wc 5138-1233372374-0 
  2033  28211 221184 5138-1233372374-0
[brong at imap11 ~]$ bc

Well, what do you know.  A multiple of 4096.

static int octet_regex(const char *text, size_t tlen, const char *pat,
                       void *rock __attribute__((unused)))
    if (!text[tlen]) {
        /* NUL-terminated string */
        return (!regexec((regex_t *) pat, text, 0, NULL, 0));
    else {
        /* regexec() requires a NUL-terminated string */
        char *buf = (char *) xstrndup(text, tlen);
        int r = !regexec((regex_t *) pat, buf, 0, NULL, 0);
        return r;

I'm annoyed I didn't notice this when I searched the code the first
time... grr.

A cute idea, checking if the string just happens to be null terminated,
but it turns out that you have no guarantee that "const char *text" can
be indexed past [tlen-1] without a segfault.  In particular, if you're
looking at the body of an mmaped file that happens to be an exact
multiple of blocks in size, and hence has no slop!

This is why I've failed to find this bloody bug before - because all my
test cases just succeeded, I couldn't recreate it.  One in 4096 emails,
and only to a person with a body regex match.  Actually, it's probably
fewer than that many, since I bet that statistically most emails are in
a smallish set of sizes that don't cluster near a multiple of 4096.


Anyway, I see no way to guarantee that we can check if the string is
NULL terminated, so I'm just going to force the else case.


More information about the Cyrus-devel mailing list