Repeat recovers on databases

Michael Bacon baconm at email.unc.edu
Fri Jun 19 16:12:40 EDT 2009


(Dropping info-cyrus on the followup)

--On June 19, 2009 3:43:43 PM -0400 Michael Bacon <baconm at email.unc.edu> 
wrote:

> --On June 19, 2009 9:57:03 AM +1000 Bron Gondwana <brong at fastmail.fm>
> wrote:
>
>> On Thu, Jun 18, 2009 at 05:44:19PM -0400, Michael Bacon wrote:
>>> Another one stomped here.  This time, it's a 32/64 bit issue.  myinit in
>>>  cyrusdb_skiplist.c assumes that type_t is 4 bytes long, and writes out
>>> that many from the current timestamp when creating
>>> $confdir/db/skipstamp.
>>
>> Actually, reading the code, that's not strictly true:
>>
>>>        a = htonl(global_recovery);
>>> -       if (r != -1) r = write(fd, &a, 4);
>>> +       if (r != -1) r = write(fd, &a, sizeof(time_t));
>>
>> It writes "a", which is the result of calling htonl on global_recovery.
>>
>> If htonl isn't returning a 32 bit value of the lower order bytes of the
>> value that it's given, then this bug is going to be causing a LOT more
>> problems than just this.  We assume this works in quite a few other
>> places in the code, including the timestamp value in the skiplist header
>> itself, and in places throughout the mailbox code too.
>>
>> "htonl" => "host to net long" by my reading.  There's also htonll for 64
>> bit values.  Is your platform creating net longlongs?
>
> Good question -- this may be a Solaris bug after all.  Solaris clearly
> defines in the man page that htonl is supposed to return a uint32_t from
> htonl, but looking at sys/byteorder.h, that's um, not being enforced...
>
># if defined(_BIG_ENDIAN) && !defined(ntohl) && !defined(__lint)
> /* big-endian */
># define ntohl(x)        (x)
># define ntohs(x)        (x)
># define htonl(x)        (x)
># define htons(x)        (x)
>
># elif !defined(ntohl) /* little-endian */
>
> I think I may give our friends out in CA a call here...

I've put in a ticket with Sun on this, but in thinking about this, I'm 
pretty sure this kind of definition is widespread (on our Linux 2.6.9 login 
cluster it's the same story in netinet/in.h), so while I can point it out 
to Sun, expecting strong typing to come out of the byteorder functions is 
probably a general mistake.  Since the functions explicitly want a uint32_t 
or a uint16_t as the argument, the 100% proper thing to do would seem to me 
to do an explicit typecast in the argument to these functions.  If it's 
just a null macro, that solves the problem, and if it's a real function, 
it's good form anyway.

-Michael


More information about the Cyrus-devel mailing list