Repeat recovers on databases

Michael Bacon baconm at email.unc.edu
Fri Jun 19 18:10:12 EDT 2009


--On June 19, 2009 4:12:40 PM -0400 Michael Bacon <baconm at email.unc.edu> 
wrote:

> (Dropping info-cyrus on the followup)
>
> --On June 19, 2009 3:43:43 PM -0400 Michael Bacon <baconm at email.unc.edu>
> wrote:
>
>> --On June 19, 2009 9:57:03 AM +1000 Bron Gondwana <brong at fastmail.fm>
>> wrote:
>>
>>> On Thu, Jun 18, 2009 at 05:44:19PM -0400, Michael Bacon wrote:
>>>> Another one stomped here.  This time, it's a 32/64 bit issue.  myinit
>>>> in cyrusdb_skiplist.c assumes that type_t is 4 bytes long, and writes
>>>>  out that many from the current timestamp when creating
>>>> $confdir/db/skipstamp.
>>>
>>> Actually, reading the code, that's not strictly true:
>>>
>>>>        a = htonl(global_recovery);
>>>> -       if (r != -1) r = write(fd, &a, 4);
>>>> +       if (r != -1) r = write(fd, &a, sizeof(time_t));
>>>
>>> It writes "a", which is the result of calling htonl on global_recovery.
>>>
>>> If htonl isn't returning a 32 bit value of the lower order bytes of the
>>> value that it's given, then this bug is going to be causing a LOT more
>>> problems than just this.  We assume this works in quite a few other
>>> places in the code, including the timestamp value in the skiplist header
>>> itself, and in places throughout the mailbox code too.
>>>
>>> "htonl" => "host to net long" by my reading.  There's also htonll for 64
>>> bit values.  Is your platform creating net longlongs?
>>
>> Good question -- this may be a Solaris bug after all.  Solaris clearly
>> defines in the man page that htonl is supposed to return a uint32_t from
>> htonl, but looking at sys/byteorder.h, that's um, not being enforced...
>>
>># if defined(_BIG_ENDIAN) && !defined(ntohl) && !defined(__lint)
>> /* big-endian */
>># define ntohl(x)        (x)
>># define ntohs(x)        (x)
>># define htonl(x)        (x)
>># define htons(x)        (x)
>>
>># elif !defined(ntohl) /* little-endian */
>>
>> I think I may give our friends out in CA a call here...
>
> I've put in a ticket with Sun on this, but in thinking about this, I'm
> pretty sure this kind of definition is widespread (on our Linux 2.6.9
> login cluster it's the same story in netinet/in.h), so while I can point
> it out to Sun, expecting strong typing to come out of the byteorder
> functions is probably a general mistake.  Since the functions explicitly
> want a uint32_t or a uint16_t as the argument, the 100% proper thing to
> do would seem to me to do an explicit typecast in the argument to these
> functions.  If it's just a null macro, that solves the problem, and if
> it's a real function, it's good form anyway.

Okay, so here's a patch to go against current CVS+Bron's last patch which 
converts everything over to uint32_t and does explicit typecasting on the 
arguments to all byteorder calls.  This passes my basic, non-production 
tests, but it may not be the way folks want to proceed, so I'll float it 
out there for feedback.

I realize this requires C99 spec compliance, but is that still problematic 
in 2009?

-Michael



-------------- next part --------------
A non-text attachment was scrubbed...
Name: skiplist_uint32.patch
Type: application/octet-stream
Size: 14006 bytes
Desc: not available
Url : http://lists.andrew.cmu.edu/pipermail/cyrus-devel/attachments/20090619/f6a5a6c6/attachment-0001.obj 


More information about the Cyrus-devel mailing list