Repeat recovers on databases
Michael Bacon
baconm at email.unc.edu
Fri Jun 19 18:10:12 EDT 2009
--On June 19, 2009 4:12:40 PM -0400 Michael Bacon <baconm at email.unc.edu>
wrote:
> (Dropping info-cyrus on the followup)
>
> --On June 19, 2009 3:43:43 PM -0400 Michael Bacon <baconm at email.unc.edu>
> wrote:
>
>> --On June 19, 2009 9:57:03 AM +1000 Bron Gondwana <brong at fastmail.fm>
>> wrote:
>>
>>> On Thu, Jun 18, 2009 at 05:44:19PM -0400, Michael Bacon wrote:
>>>> Another one stomped here. This time, it's a 32/64 bit issue. myinit
>>>> in cyrusdb_skiplist.c assumes that type_t is 4 bytes long, and writes
>>>> out that many from the current timestamp when creating
>>>> $confdir/db/skipstamp.
>>>
>>> Actually, reading the code, that's not strictly true:
>>>
>>>> a = htonl(global_recovery);
>>>> - if (r != -1) r = write(fd, &a, 4);
>>>> + if (r != -1) r = write(fd, &a, sizeof(time_t));
>>>
>>> It writes "a", which is the result of calling htonl on global_recovery.
>>>
>>> If htonl isn't returning a 32 bit value of the lower order bytes of the
>>> value that it's given, then this bug is going to be causing a LOT more
>>> problems than just this. We assume this works in quite a few other
>>> places in the code, including the timestamp value in the skiplist header
>>> itself, and in places throughout the mailbox code too.
>>>
>>> "htonl" => "host to net long" by my reading. There's also htonll for 64
>>> bit values. Is your platform creating net longlongs?
>>
>> Good question -- this may be a Solaris bug after all. Solaris clearly
>> defines in the man page that htonl is supposed to return a uint32_t from
>> htonl, but looking at sys/byteorder.h, that's um, not being enforced...
>>
>># if defined(_BIG_ENDIAN) && !defined(ntohl) && !defined(__lint)
>> /* big-endian */
>># define ntohl(x) (x)
>># define ntohs(x) (x)
>># define htonl(x) (x)
>># define htons(x) (x)
>>
>># elif !defined(ntohl) /* little-endian */
>>
>> I think I may give our friends out in CA a call here...
>
> I've put in a ticket with Sun on this, but in thinking about this, I'm
> pretty sure this kind of definition is widespread (on our Linux 2.6.9
> login cluster it's the same story in netinet/in.h), so while I can point
> it out to Sun, expecting strong typing to come out of the byteorder
> functions is probably a general mistake. Since the functions explicitly
> want a uint32_t or a uint16_t as the argument, the 100% proper thing to
> do would seem to me to do an explicit typecast in the argument to these
> functions. If it's just a null macro, that solves the problem, and if
> it's a real function, it's good form anyway.
Okay, so here's a patch to go against current CVS+Bron's last patch which
converts everything over to uint32_t and does explicit typecasting on the
arguments to all byteorder calls. This passes my basic, non-production
tests, but it may not be the way folks want to proceed, so I'll float it
out there for feedback.
I realize this requires C99 spec compliance, but is that still problematic
in 2009?
-Michael
-------------- next part --------------
A non-text attachment was scrubbed...
Name: skiplist_uint32.patch
Type: application/octet-stream
Size: 14006 bytes
Desc: not available
Url : http://lists.andrew.cmu.edu/pipermail/cyrus-devel/attachments/20090619/f6a5a6c6/attachment-0001.obj
More information about the Cyrus-devel
mailing list