Discussion:
On not initializing additional members of standard structs.
(too old to reply)
Kaz Kylheku
2016-10-12 00:19:02 UTC
Permalink
Raw Message
Some structures defined by the Library may have additional
members defined by the implementation. An example of this
is struct tm.

Suppose we instantiate a struct tm in automatic storage
and initialize, by assignment, only the ISO-documented members.
Then we then use it is input to a function like asctime.

Is the behavior defined? If not, from where can we infer
the requirement for initializing the extra members?

(It's not a given that just because member exists, neglecting to
initialize it is a problem. Such members can be "scratch areas" for
internal use, initialized by the implementation prior to any access. Of
they can be outright outputs. In that same struct tm, some Unix
libraries have members tm_gmtoff and tm_zone. These are written by the
implementation when it calculates a broken down time, providing
additional information that programs may access.)
James Kuyper
2016-10-12 02:59:10 UTC
Permalink
Raw Message
Post by Kaz Kylheku
Some structures defined by the Library may have additional
members defined by the implementation. An example of this
is struct tm.
Suppose we instantiate a struct tm in automatic storage
and initialize, by assignment, only the ISO-documented members.
Then we then use it is input to a function like asctime.
Is the behavior defined? If not, from where can we infer
the requirement for initializing the extra members?
Such a statement could only reasonably be inferred from an explicit
statement to that effect in the description of the function your code
passes the struct to. The explicit statement would have to refer to
zero-initializing the struct. There are, for instance, no such
statements in the C standard's descriptions of the functions that take
struct tm or struct tm* arguments. The same is true of functions taking
struct timespec arguments. struct lconv is never used as an input to a
function, so the issue never comes up for that type.

However, the jmp_buf, va_list, FILE, fpos_t, and mbstate_t types are all
specified sufficiently vaguely to allow them to be struct types as well,
but none of them have any fields whose names are specified by the
standard. What the standard says about the functions using those types
is an example of what it would have to say if struct tm or struct
timespec had additional members with unspecified names that needed
special initialization:

jmp_buf: longjmp() requires that jmp_buf's contents were filled in by a
previous call to setjmp().

va_list: va_arg and va_end require that the va_list object be
initialized by either va_start or va_copy.

FILE: All functions taking FILE* arguments require that the
corresponding arguments be either one of the three standard streams, or
a value returned by a call to fopen(), tmp_file(), or freopen() that
successfully associates a file with that stream.

fpos_t: fsetpos() only accepts fpos_t objects that have been filled in
by a previous call to fgetpos()

mbstate_t: 7.29.6p3 identifies zero-initialization as one possible way
of creating an mbstate_t object describing the initial shift state. The
only other way described by the standard for giving such an object a
defined meaning is to calling any of a variety of functions that set the
value of that object.
Kaz Kylheku
2016-10-12 16:27:33 UTC
Permalink
Raw Message
Post by James Kuyper
Post by Kaz Kylheku
Some structures defined by the Library may have additional
members defined by the implementation. An example of this
is struct tm.
Suppose we instantiate a struct tm in automatic storage
and initialize, by assignment, only the ISO-documented members.
Then we then use it is input to a function like asctime.
Is the behavior defined? If not, from where can we infer
the requirement for initializing the extra members?
Such a statement could only reasonably be inferred from an explicit
statement to that effect in the description of the function your code
passes the struct to.
Thanks; that is what I believe. (Thought I was crazy for a moment.)
s***@casperkitty.com
2016-10-12 17:05:07 UTC
Permalink
Raw Message
Post by Kaz Kylheku
Post by James Kuyper
Such a statement could only reasonably be inferred from an explicit
statement to that effect in the description of the function your code
passes the struct to.
Thanks; that is what I believe. (Thought I was crazy for a moment.)
How do you interpret the rules when it comes to copying structures without
using their contents. Given the code:

struct foo { uint32_t x,y; }

uint32_t hey(int mode)
{
struct foo a,b;
if (mode & 1)
a.x = 1;
if (mode & 2)
a.y = 2;
b = a;
if (mode & 4)
return 0;
return b.x;
}

for which values of "mode" should behavior be considered defined? Clearly
for 3 and 7 there should be no problem; what about other values? I think
the intention of the Standard would be that it should be defined in cases 1,
5, and 6, and I don't see any practical benefit from not having it be defined
in 4 as well (if code doesn't have anything useful to put into any values of
a structure, but code that receives a copy won't use any fields, I don't see
much point to saying that code must write some field--any field--before it
can copy the structure).

In cases 0 and 2, I could see a basis for UB in that code reads an
unintialized uint32_t whose address is not taken. If a compiler uses a
64-bit register to hold the contents of a uint32_t, it might ensure that
the upper bits are never written as anything other than zero, but it
might return the register unaltered (holding a left-over value whose
upper bits aren't zero) if the uint32_t is returned without having been
written first.

Would that coincide with your interpretation?
Kaz Kylheku
2016-10-12 18:26:31 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
Post by Kaz Kylheku
Post by James Kuyper
Such a statement could only reasonably be inferred from an explicit
statement to that effect in the description of the function your code
passes the struct to.
Thanks; that is what I believe. (Thought I was crazy for a moment.)
How do you interpret the rules when it comes to copying structures without
struct foo { uint32_t x,y; }
uint32_t hey(int mode)
{
struct foo a,b;
if (mode & 1)
a.x = 1;
if (mode & 2)
a.y = 2;
b = a;
That's UB if only one of the previous assignments executed. This
assignment statement is equivalent to the member-wise assignment:

(a.x = b.x, a.y = b.y);

In my original question, I'm not concerned with by-value passing
of incompletely initialized structures.

If an implementation wants to do this kind of thing internally
inside ascitime with my partially initialized struct tm, it has to
ensure that nothing bad actually happens; that some uninitialized
bits get copied and that's all.
s***@casperkitty.com
2016-10-12 21:57:32 UTC
Permalink
Raw Message
Post by Kaz Kylheku
Post by s***@casperkitty.com
How do you interpret the rules when it comes to copying structures without
struct foo { uint32_t x,y; }
uint32_t hey(int mode)
{
struct foo a,b;
if (mode & 1)
a.x = 1;
if (mode & 2)
a.y = 2;
b = a;
That's UB if only one of the previous assignments executed. This
(a.x = b.x, a.y = b.y);
The C Standard specifies in 6.2.6.1p6 that the value of a struct or union,
as a whole, cannot be a regarded as trap representation even when some or
all of its members are. If that does not justify copying a partially-
populated structure, what is the purpose of that text in the Standard?

Saying that an uninitialized value of any type **which is allowed to hold
a trap representation** may behave as though it does hold a trap
representation (i.e. invoke UB) has the useful purpose of avoiding making
the compiler initialize variables that could hold trap representations in
cases where it can't tell that they will be written before they are read,
thus improving efficiency.

Saying that copying a structure will be defined behavior even if some
fields hold trap representations, however, makes it possible for programmers
to improve efficiency by refraining from populating fields whose values will
never be read except as part of a whole-structure copy. Is there any
evidence that the authors of the Standard did not intend to allow such
efficiency-improving techniques?
Kaz Kylheku
2016-10-12 23:15:10 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
Post by Kaz Kylheku
Post by s***@casperkitty.com
How do you interpret the rules when it comes to copying structures without
struct foo { uint32_t x,y; }
uint32_t hey(int mode)
{
struct foo a,b;
if (mode & 1)
a.x = 1;
if (mode & 2)
a.y = 2;
b = a;
That's UB if only one of the previous assignments executed. This
(a.x = b.x, a.y = b.y);
The C Standard specifies in 6.2.6.1p6 that the value of a struct or union,
as a whole, cannot be a regarded as trap representation even when some or
all of its members are. If that does not justify copying a partially-
populated structure, what is the purpose of that text in the Standard?
C had the concept of indeterminately-valued objects before the "trap
representation" concept and terminology was introduced in C99. It's not
trap representations that make uninitialized objects
"indeterminately-valued"; even types that have a total mapping from all
possible bitwise images to values can be indeterminately-valued.

In most mainstream platforms, the types like int and unsignd int use
every bit pattern and so do not admit trap represenation; you can still
have an uninitialized int whose use is UB.
s***@casperkitty.com
2016-10-12 23:39:19 UTC
Permalink
Raw Message
Post by Kaz Kylheku
C had the concept of indeterminately-valued objects before the "trap
representation" concept and terminology was introduced in C99. It's not
trap representations that make uninitialized objects
"indeterminately-valued"; even types that have a total mapping from all
possible bitwise images to values can be indeterminately-valued.
That may be, but what would be the purpose of the language about structures
not having trap representations, if not to avoid requiring code to clear out
irrelevant parts of a structure?
Post by Kaz Kylheku
In most mainstream platforms, the types like int and unsignd int use
every bit pattern and so do not admit trap represenation; you can still
have an uninitialized int whose use is UB.
In many implementations, values of certain types may be represented in ways
which don't match their memory representation. For example, on a 64-bit
system it would be very common for functions returning "uint32_t" to return
their result in a 32-bit register, and for calling code to expect that the
upper 32 bits of that register are zero. Given something like:

uint32_t foo(uint64_t x)
{
uint32_t result;
if (a(x)) result = 23;
if (b(x)) result = 47;
if (c(x)) result = 69;
return result;
}

the authors of the Standard could have either:

1. Required that a compiler ensure that any register being used to hold
"result" must have its upper 32 bits cleared regardless of whether
anything writes "result", or

2. Defined behavioral semantics where the return value of "foo" might
exceed the range of a uint32_t, or

3. Recognize that an object whose address is not taken, need not be
stored using the bit-level representations defined for objects in
memory, and might have room for states which wouldn't "fit" in the
memory layout, and might be trap representations.

I think the intention of the Standard was to allow #3.
Kaz Kylheku
2016-10-13 00:01:30 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
Post by Kaz Kylheku
C had the concept of indeterminately-valued objects before the "trap
representation" concept and terminology was introduced in C99. It's not
trap representations that make uninitialized objects
"indeterminately-valued"; even types that have a total mapping from all
possible bitwise images to values can be indeterminately-valued.
That may be, but what would be the purpose of the language about structures
not having trap representations, if not to avoid requiring code to clear out
irrelevant parts of a structure?
I don't see the purpose of "trap representation" in the first place.
It's semantic fluff without which the C language got along just fine.

The purpose of that rule seems to be to allow code to be correct
which stuffs trap representations into a structure and then
copies the structure. To write such code, you need to have some
way to create trap representations, which is not portable.
This could be useful for debugging.

For instance in that example function where you conditionally
initialize (by assignment) member a or b, if the platform has
a trap representation for that member type (like a special constant), it
be explicitly stored into the other, conceptually uninitialized (though
now in actual fact initialized to a trap value) member.

That trap representation is then copied when the structure is
assigned; and it doesn't trap until something accesses the trap member
in the copy (or original).

Yet, a plain old uninitialized member can still be diagnosed at struct
copy time (even if it happens to have trap representation by chance).
The uninitialized status of memory can be tracked using metadata stored
elsewhere. For instance the Valgrind debugging tool knows that a struct
member of type int is uninitialized, even if the garbage happens to be
all zeros, looking like a valid value. It knows that because for each
valid memory location in the program's space, it holds information about
whether it has been initialized.

The concept of "initialized" isn't something that is stored in
an object; it has to do with what has or hasn't happened to it since it
came into existence.
s***@casperkitty.com
2016-10-13 05:59:03 UTC
Permalink
Raw Message
Post by Kaz Kylheku
I don't see the purpose of "trap representation" in the first place.
It's semantic fluff without which the C language got along just fine.
On some platforms, loading certain bit patterns as though they were values
of a certain type may cause weird behavior. Such behavior need not be
limited to invoking hardware traps. On sign-magnitude or ones'-complement
machines, storing certain bit patterns in unsigned variables may cause them
to hold negative values (which should of course be impossible). The way the
Standard declines to impose requirements about how such things should behave
is to label them as Undefined Behavior.
Post by Kaz Kylheku
The purpose of that rule seems to be to allow code to be correct
which stuffs trap representations into a structure and then
copies the structure. To write such code, you need to have some
way to create trap representations, which is not portable.
This could be useful for debugging.
The authors of the Standard generally don't bother specifying guarantees
which would only be meaningful for the vast majority of implementations
rather than all of them. I do not think it plausible that they would
have written a rule for the purpose you describe.

It seems far more plausible that the rule was written to avoid requiring
programmers to write useless code.
Post by Kaz Kylheku
Yet, a plain old uninitialized member can still be diagnosed at struct
copy time (even if it happens to have trap representation by chance).
The uninitialized status of memory can be tracked using metadata stored
elsewhere. For instance the Valgrind debugging tool knows that a struct
member of type int is uninitialized, even if the garbage happens to be
all zeros, looking like a valid value. It knows that because for each
valid memory location in the program's space, it holds information about
whether it has been initialized.
There are many situations where it's useful to have debugger traps fire
for behavior which is 100% defined, or where it's useful to have an
implementation impose restrictions on code beyond what the Standard would
allow. I see no reason the Standard needs to accommodate things like
debugger traps when they're best handled by the use of non-conforming modes.
Post by Kaz Kylheku
The concept of "initialized" isn't something that is stored in
an object; it has to do with what has or hasn't happened to it since it
came into existence.
Is any useful purpose served by making the Standard care? Such things seem
highly specialized and are only meaningful on a small number of platforms.

By contrast, the ability to make code more efficient by eliminating stores
of meaningless values is meaningful on nearly every platform imaginable.

Which interpretation makes more sense?
Keith Thompson
2016-10-13 15:04:49 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
Post by Kaz Kylheku
I don't see the purpose of "trap representation" in the first place.
It's semantic fluff without which the C language got along just fine.
On some platforms, loading certain bit patterns as though they were values
of a certain type may cause weird behavior. Such behavior need not be
limited to invoking hardware traps.
Agreed.
Post by s***@casperkitty.com
On sign-magnitude or ones'-complement
machines, storing certain bit patterns in unsigned variables may cause them
to hold negative values (which should of course be impossible).
Are you thinking of a system where the sign bit is a padding bit for
unsigned types? Is there a particular reason that sign-and-magnitude or
one's-complement systems would do that -- or that two's-complement
systems wouldn't?
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
s***@casperkitty.com
2016-10-13 19:29:21 UTC
Permalink
Raw Message
Post by Keith Thompson
Post by s***@casperkitty.com
On sign-magnitude or ones'-complement
machines, storing certain bit patterns in unsigned variables may cause them
to hold negative values (which should of course be impossible).
Are you thinking of a system where the sign bit is a padding bit for
unsigned types? Is there a particular reason that sign-and-magnitude or
one's-complement systems would do that -- or that two's-complement
systems wouldn't?
A two's-complement machine can use the same instructions for signed and
unsigned addition, and likewise subtraction, single-size multiplication,
and left-shift. A ones'-complement or sign magnitude machine would require
different instructions for all of those unless it restricts unsigned values
to a small enough range that each sub-computation will fit within the range
of a signed type. If unsigned values are limited to the range 0..INT_MAX,
then two values may be added by subtracting INT_MAX the first, adding the
second, and then either adding INT_MAX or subtracting 1 depending upon
whether the result is greater than zero. Clunky, but still possibly better
than any other alternative the hardware might provide.
Keith Thompson
2016-10-13 21:00:09 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
Post by Keith Thompson
Post by s***@casperkitty.com
On sign-magnitude or
ones'-complement machines, storing certain bit patterns in unsigned
variables may cause them to hold negative values (which should of
course be impossible).
Are you thinking of a system where the sign bit is a padding bit for
unsigned types? Is there a particular reason that sign-and-magnitude or
one's-complement systems would do that -- or that two's-complement
systems wouldn't?
A two's-complement machine can use the same instructions for signed and
unsigned addition, and likewise subtraction, single-size multiplication,
and left-shift. A ones'-complement or sign magnitude machine would require
different instructions for all of those unless it restricts unsigned values
to a small enough range that each sub-computation will fit within the range
of a signed type. If unsigned values are limited to the range 0..INT_MAX,
then two values may be added by subtracting INT_MAX the first, adding the
second, and then either adding INT_MAX or subtracting 1 depending upon
whether the result is greater than zero. Clunky, but still possibly better
than any other alternative the hardware might provide.
I'm not familiar with real-world non-two's-complement machines. Do they
actually restrict the range of unsigned types like that, or do they just
have different instructions for signed and unsigned arithmetic? I would
have expected the latter.

It probably doesn't matter much to your main point, which is that if,
for example, unsigned int is restricted to 0..INT_MAX, it's possible to
store an invalid value in an unsigned int object that would be negative
if it were interpreted as a signed int object.
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Wojtek Lerch
2016-10-14 03:16:55 UTC
Permalink
Raw Message
Post by Keith Thompson
It probably doesn't matter much to your main point, which is that if,
for example, unsigned int is restricted to 0..INT_MAX, it's possible to
store an invalid value in an unsigned int object that would be negative
if it were interpreted as a signed int object.
Not necessarily. The implementation could consider it a valid value.
s***@casperkitty.com
2016-10-14 14:22:31 UTC
Permalink
Raw Message
Post by Wojtek Lerch
Post by Keith Thompson
It probably doesn't matter much to your main point, which is that if,
for example, unsigned int is restricted to 0..INT_MAX, it's possible to
store an invalid value in an unsigned int object that would be negative
if it were interpreted as a signed int object.
Not necessarily. The implementation could consider it a valid value.
Given something like:

float f(unsigned *p) { return (float)*p; }

The Standard does not define any means via which such a function could
return a negative value. On the other hand, a sign-magnitude machine
might very well return -1.0f in response to:

float test(void) { int i=-1; f((unsigned*)&i); }

Behavior would be defined if a pointer to a non-negative "int" value were
dereferenced as unsigned, or if an unsigned value within the range of "int"
were dereferenced as "int", but dereferencing a negative "int" as unsigned
is left undefined to avoid requiring implementations to add extra code to
ensure that unsigned values don't behave like negative numbers. Note that
the authors of the Standard didn't intend that implementations shouldn't
treat UB in a predictable fashion when practical, since they expressly
noted that as a commonplace treatment (behave in a documented fashion
characteristic of the environment, and a ones'-complement implementation
such as described could very easily document the above behavior) but from
the point of view of the Standard it would not be required to do anything
in particular.
Keith Thompson
2016-10-14 15:31:55 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
Post by Wojtek Lerch
Post by Keith Thompson
It probably doesn't matter much to your main point, which is that if,
for example, unsigned int is restricted to 0..INT_MAX, it's possible to
store an invalid value in an unsigned int object that would be negative
if it were interpreted as a signed int object.
Not necessarily. The implementation could consider it a valid value.
float f(unsigned *p) { return (float)*p; }
The Standard does not define any means via which such a function could
return a negative value. On the other hand, a sign-magnitude machine
float test(void) { int i=-1; f((unsigned*)&i); }
Yes -- and so could a two's-complement machine if it limited the range
of unsigned int to 0..INT_MAX.

[...]
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
s***@casperkitty.com
2016-10-14 16:39:28 UTC
Permalink
Raw Message
Post by Keith Thompson
Post by s***@casperkitty.com
The Standard does not define any means via which such a function could
return a negative value. On the other hand, a sign-magnitude machine
float test(void) { int i=-1; f((unsigned*)&i); }
Yes -- and so could a two's-complement machine if it limited the range
of unsigned int to 0..INT_MAX.
I am unaware of any two's-complement implementations that behave in such
fashion. The only reason I can imagine an implementation doing so would
be if hardware used saturating arithmetic or disruptively trapped on
overflow, and didn't have any means of overriding such behaviors. Unless
a machine has unavoidable overflow behaviors, most forms of signed and
unsigned arithmetic will behave identically at the bit level. By contrast,
in a ones'-complement machine, the behaviors of signed and unsigned numbers
will differ, at the bit level, in cases where the MSB is set; if hardware
is not set up to accommodate those differences, the most practical way to
take care of them in software is to not *have* any unsigned-arithmetic cases
where the MSB is set.
Keith Thompson
2016-10-14 19:08:11 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
Post by Keith Thompson
Post by s***@casperkitty.com
The Standard does not define any means via which such a function could
return a negative value. On the other hand, a sign-magnitude machine
float test(void) { int i=-1; f((unsigned*)&i); }
Yes -- and so could a two's-complement machine if it limited the range
of unsigned int to 0..INT_MAX.
I am unaware of any two's-complement implementations that behave in such
fashion.
I am unaware of any one's-complement or sign-and-magnitude
implementations that behave in such fashion -- because I am unaware
of any one's-complement or sign-and-magnitude implementations.
(I've heard of some, but I lack any familiarity with them.)
Post by s***@casperkitty.com
The only reason I can imagine an implementation doing so would
be if hardware used saturating arithmetic or disruptively trapped on
overflow, and didn't have any means of overriding such behaviors. Unless
a machine has unavoidable overflow behaviors, most forms of signed and
unsigned arithmetic will behave identically at the bit level. By contrast,
in a ones'-complement machine, the behaviors of signed and unsigned numbers
will differ, at the bit level, in cases where the MSB is set; if hardware
is not set up to accommodate those differences, the most practical way to
take care of them in software is to not *have* any unsigned-arithmetic cases
where the MSB is set.
A non-two's-complement machine could simply have separate
instructions for signed and unsigned arithmetic, allowing it to
use all the bits for unsigned integers. In principle, there's no
necessary connection between (a) the representation used for signed
types and (b) whether unsigned types treat the sign bit as a value
bit or as a padding bit. There might be reasons for a correlation
in practice, but I suggest that's not relevant to your point.
The behavior you describe above applies to implementations on which
unsigned types treat the sign bit as a padding bit -- regardless
of the representation used for signed types.
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
s***@casperkitty.com
2016-10-14 22:24:48 UTC
Permalink
Raw Message
Post by Keith Thompson
A non-two's-complement machine could simply have separate
instructions for signed and unsigned arithmetic, allowing it to
use all the bits for unsigned integers.
One could, but I don't think all of them do.
Post by Keith Thompson
In principle, there's no
necessary connection between (a) the representation used for signed
types and (b) whether unsigned types treat the sign bit as a value
bit or as a padding bit. There might be reasons for a correlation
in practice, but I suggest that's not relevant to your point.
My point was that some platforms, making UINT_MAX==INT_MAX may be a result
of practical necessity; if a one's-complement system doesn't have unsigned
arithmetic instructions, masking off the sign bit after unsigned operations
may not be terribly cheap, but it may be much cheaper than any other
allowable alternative.

On a two's-complement system, it would be far harder to find a compelling
reason why a system would work that way beyond "because the Standard allows
it".
Keith Thompson
2016-10-14 23:23:17 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
Post by Keith Thompson
A non-two's-complement machine could simply have separate
instructions for signed and unsigned arithmetic, allowing it to
use all the bits for unsigned integers.
One could, but I don't think all of them do.
Post by Keith Thompson
In principle, there's no
necessary connection between (a) the representation used for signed
types and (b) whether unsigned types treat the sign bit as a value
bit or as a padding bit. There might be reasons for a correlation
in practice, but I suggest that's not relevant to your point.
My point was that some platforms, making UINT_MAX==INT_MAX may be a result
of practical necessity; if a one's-complement system doesn't have unsigned
arithmetic instructions, masking off the sign bit after unsigned operations
may not be terribly cheap, but it may be much cheaper than any other
allowable alternative.
On a two's-complement system, it would be far harder to find a compelling
reason why a system would work that way beyond "because the Standard allows
it".
Yes, I understood your point several followups ago. Did you
understand mine?
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Wojtek Lerch
2016-10-15 03:12:19 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
Post by Wojtek Lerch
Post by Keith Thompson
It probably doesn't matter much to your main point, which is that if,
for example, unsigned int is restricted to 0..INT_MAX, it's possible to
store an invalid value in an unsigned int object that would be negative
if it were interpreted as a signed int object.
Not necessarily. The implementation could consider it a valid value.
float f(unsigned *p) { return (float)*p; }
The Standard does not define any means via which such a function could
return a negative value.
Does the Standard forbid FLT_MAX to be less than UINT_MAX? Because if
not, then the conversion might be undefined, and could return anything
(or nothing at all), including a negative value, couldn't it?

But what does that have to do with Keith Thompson's example and my
comment about it? We were talking about integer types.
Post by s***@casperkitty.com
On the other hand, a sign-magnitude machine
float test(void) { int i=-1; f((unsigned*)&i); }
Behavior would be defined if a pointer to a non-negative "int" value were
dereferenced as unsigned,
Agreed.
Post by s***@casperkitty.com
or if an unsigned value within the range of "int" were dereferenced as "int",
No, the Standard does not guarantee that.
Post by s***@casperkitty.com
but dereferencing a negative "int" as unsigned
is left undefined to avoid requiring implementations to add extra code to
ensure that unsigned values don't behave like negative numbers. Note that
the authors of the Standard didn't intend that implementations shouldn't
treat UB in a predictable fashion when practical, since they expressly
noted that as a commonplace treatment
How does that follow? If they just noted that some behaviour was
commonplace, that's nothing more than a piece of historical trivia. It
implies that there were few exceptions at the time, not that the intent
was to forbid exceptions. Or even to discourage them.
Post by s***@casperkitty.com
(behave in a documented fashion
characteristic of the environment, and a ones'-complement implementation
such as described could very easily document the above behavior) but from
the point of view of the Standard it would not be required to do anything
in particular.
Right.
s***@casperkitty.com
2016-10-15 08:14:05 UTC
Permalink
Raw Message
Post by Wojtek Lerch
Post by s***@casperkitty.com
The Standard does not define any means via which such a function could
return a negative value.
Does the Standard forbid FLT_MAX to be less than UINT_MAX? Because if
not, then the conversion might be undefined, and could return anything
(or nothing at all), including a negative value, couldn't it?
The Standard establishes a minimum value for FLT_MAX. It would in theory
be possible for an implementation to have UINT_MAX be larger than that,
but my point to pick a type which is likely to have a larger range than
"int".
Post by Wojtek Lerch
But what does that have to do with Keith Thompson's example and my
comment about it? We were talking about integer types.
There are systems where "int" is the same size as "long long", so the fact
that converting a particular "unsigned int" value to "long long" yielded
a negative number wouldn't necessarily be impossible.
Post by Wojtek Lerch
Post by s***@casperkitty.com
On the other hand, a sign-magnitude machine
float test(void) { int i=-1; f((unsigned*)&i); }
Behavior would be defined if a pointer to a non-negative "int" value were
dereferenced as unsigned,
Agreed.
Post by s***@casperkitty.com
or if an unsigned value within the range of "int" were dereferenced as "int",
No, the Standard does not guarantee that.
Why not? It's one of the cases that is expressly permitted by the aliasing
rules.
Post by Wojtek Lerch
Post by s***@casperkitty.com
but dereferencing a negative "int" as unsigned
is left undefined to avoid requiring implementations to add extra code to
ensure that unsigned values don't behave like negative numbers. Note that
the authors of the Standard didn't intend that implementations shouldn't
treat UB in a predictable fashion when practical, since they expressly
noted that as a commonplace treatment
How does that follow? If they just noted that some behaviour was
commonplace, that's nothing more than a piece of historical trivia. It
implies that there were few exceptions at the time, not that the intent
was to forbid exceptions. Or even to discourage them.
One of the things that made C successful was that in cases where a program's
requirements could be met without needing executable code to handle boundary
conditions, such code didn't have to be written by the compiler or the
programmer. This was a change from some previous languages which tended to
add such boundary-checking code whether the program really needed it or not.

Which boundary conditions would need to be handled in machine code would
vary from platform to platform, and code which omits boundary checks which
aren't needed on its intended target platform would likely fail if migrated
to a platform that did need such checks. On the other hand, most micro-
computer platforms were pretty consistent about how boundary conditions
behaved. Especially in cases where code didn't need precise behavior but
simply needed to stay on the rails [e.g. code needs to compute x<<y in cases
where y is less than the word size, but may arbitrarily compute either
x<<(y % wordsize) or (x<<(y-1))<<1 in other cases] having compilers adhere
to loose behavioral guarantees allowed more efficient code than would in
many cases otherwise be possible.

Are you saying the authors of the Standard didn't intend that programmers
should be able to write efficient code? How much of a speed penalty do
you think they would have intended that programmers should be willing to
accept?
Wojtek Lerch
2016-10-15 15:04:25 UTC
Permalink
Raw Message
...
Post by s***@casperkitty.com
Post by Wojtek Lerch
But what does that have to do with Keith Thompson's example and my
comment about it? We were talking about integer types.
There are systems where "int" is the same size as "long long", so the fact
that converting a particular "unsigned int" value to "long long" yielded
a negative number wouldn't necessarily be impossible.
Sure, converting an "unsigned int" to "long log" could sometimes have
undefined behaviour, but again, what does that have to do with Keith
Thompson's example and my comment about it?
Post by s***@casperkitty.com
Post by Wojtek Lerch
Post by Wojtek Lerch
Post by Keith Thompson
It probably doesn't matter much to your main point, which is that if,
for example, unsigned int is restricted to 0..INT_MAX, it's possible to
store an invalid value in an unsigned int object that would be negative
if it were interpreted as a signed int object.
Not necessarily. The implementation could consider it a valid value.
...
Post by s***@casperkitty.com
Post by Wojtek Lerch
Post by Wojtek Lerch
Behavior would be defined if a pointer to a non-negative "int" value were
dereferenced as unsigned,
Agreed.
Post by Wojtek Lerch
or if an unsigned value within the range of "int" were dereferenced as "int",
No, the Standard does not guarantee that.
Why not? It's one of the cases that is expressly permitted by the aliasing
rules.
It's one of the cases that is expressly listed as an exception to the
rules that forbid accessing an object using a different type, yes; but
that does not mean that all such accesses automatically have defined
behaviour. They're still subject to other rules, such as the one that
forbids accessing a trap representation in an object. And the
Standard's does not promise that *all* unsigned representations of an
unsigned value less than or equal to INT_MAX are also valid signed
representations.
Post by s***@casperkitty.com
Post by Wojtek Lerch
Post by Wojtek Lerch
but dereferencing a negative "int" as unsigned
is left undefined to avoid requiring implementations to add extra code to
ensure that unsigned values don't behave like negative numbers. Note that
the authors of the Standard didn't intend that implementations shouldn't
treat UB in a predictable fashion when practical, since they expressly
noted that as a commonplace treatment
How does that follow? If they just noted that some behaviour was
commonplace, that's nothing more than a piece of historical trivia. It
implies that there were few exceptions at the time, not that the intent
was to forbid exceptions. Or even to discourage them.
[...]
Post by s***@casperkitty.com
Are you saying the authors of the Standard didn't intend that programmers
should be able to write efficient code? [...]
Where on Earth did you get that idea from? You seem to have a habit of
interpreting other people's words in, um, creative ways. I was not
making any claims about the intents of the Standard's authors. I was
only pointing out that your conclusion did not follow from your premise
-- it was about the logic of your words, not about the facts of the world.
s***@casperkitty.com
2016-10-15 19:06:56 UTC
Permalink
Raw Message
Post by Wojtek Lerch
Sure, converting an "unsigned int" to "long log" could sometimes have
undefined behaviour, but again, what does that have to do with Keith
Thompson's example and my comment about it?
My point was to illustrate that an implementation need not go out of its
way to regard a value with an invalid padding bit as "unusual" to make
such values cause weird behavior. An implementation may expect that a
storage for value of type "unsigned" holds a bit pattern which, if
interpreted as "int", would be positive. An implementation might not
notice anything weird if it holds the bit pattern for a negative number,
but the Standard could not characterize such action as anything other
than UB.
Post by Wojtek Lerch
Post by s***@casperkitty.com
Why not? It's one of the cases that is expressly permitted by the aliasing
rules.
It's one of the cases that is expressly listed as an exception to the
rules that forbid accessing an object using a different type, yes; but
that does not mean that all such accesses automatically have defined
behaviour. They're still subject to other rules, such as the one that
forbids accessing a trap representation in an object. And the
Standard's does not promise that *all* unsigned representations of an
unsigned value less than or equal to INT_MAX are also valid signed
representations.
N1570 6.2.5p9
9 The range of nonnegative values of a signed integer type is a
subrange of the corresponding unsigned integer type, and the
representation of the same value in each type is the same.

What unsigned values which are not greater than INT_MAX could have
representations that differ from their signed counterparts? I don't
see that the above offers any latitude whatsoever.
Post by Wojtek Lerch
Post by s***@casperkitty.com
Are you saying the authors of the Standard didn't intend that programmers
should be able to write efficient code? [...]
Where on Earth did you get that idea from? You seem to have a habit of
interpreting other people's words in, um, creative ways. I was not
making any claims about the intents of the Standard's authors. I was
only pointing out that your conclusion did not follow from your premise
-- it was about the logic of your words, not about the facts of the world.
Returning to the main subject of this thread, if the intention of the
Standard is that programs not wishing to launch nuclear missiles must store
meaningless values into structure fields that no code will actually care
about, and writing such structure fields will in most cases make code less
efficient than it otherwise would be, that would seem to imply that the
intention of the Standard is to require programmers to make their code less
efficient in most cases than it otherwise could be. Is that clearer?
Keith Thompson
2016-10-15 19:41:37 UTC
Permalink
Raw Message
***@casperkitty.com writes:
[...]
Post by s***@casperkitty.com
N1570 6.2.5p9
9 The range of nonnegative values of a signed integer type is a
subrange of the corresponding unsigned integer type, and the
representation of the same value in each type is the same.
What unsigned values which are not greater than INT_MAX could have
representations that differ from their signed counterparts? I don't
see that the above offers any latitude whatsoever.
Interesting point. That paragraph refers to *the representation*,
but 6.2.6.1p4 says that a value may have more than one represenation:

Two values (other than NaNs) with the same object representation
compare equal, but values that compare equal may have different
object representations.

(I'm assuming that values that compare equal are the "same" value,
but the phrase "Two values" in that sentence calls that assumption
into some question. 6.5.9, describing the "==" and "!=" operators,
is not helpful; it says only that they yield 1 if "the specified
relation is true" and 0 otherwise.)

A footnote on 6.2.5p9 says:

The same representation and alignment requirements are meant to
imply interchangeability as arguments to functions, return values
from functions, and members of unions.

I'm fairly sure that the *intent* is that a value within the range of
values of a signed type and of its corresponding unsigned type has all
the same representations in both types. The interchangeability
discussed in the footnote would be difficult or impossible otherwise.
That intent could have been expressed more clearly.

That creates an interesting wrinkle for negative zero, which I'll
start a new thread to discuss.
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Wojtek Lerch
2016-10-16 04:03:07 UTC
Permalink
Raw Message
...
Post by s***@casperkitty.com
N1570 6.2.5p9
9 The range of nonnegative values of a signed integer type is a
subrange of the corresponding unsigned integer type, and the
representation of the same value in each type is the same.
What unsigned values which are not greater than INT_MAX could have
representations that differ from their signed counterparts? I don't
see that the above offers any latitude whatsoever.
A value could have more than one representation in the signed type, and
even more different representations in the unsigned type. See 6.2.6.2#5
and my response to Tim Rentsch's post.
Post by s***@casperkitty.com
Post by Wojtek Lerch
Post by s***@casperkitty.com
Are you saying the authors of the Standard didn't intend that programmers
should be able to write efficient code? [...]
Where on Earth did you get that idea from? You seem to have a habit of
interpreting other people's words in, um, creative ways. I was not
making any claims about the intents of the Standard's authors. I was
only pointing out that your conclusion did not follow from your premise
-- it was about the logic of your words, not about the facts of the world.
Returning to the main subject of this thread, if the intention of the
Standard is that programs not wishing to launch nuclear missiles must store
meaningless values into structure fields that no code will actually care
about, and writing such structure fields will in most cases make code less
efficient than it otherwise would be, that would seem to imply that the
intention of the Standard is to require programmers to make their code less
efficient in most cases than it otherwise could be. Is that clearer?
That's clear and I agree with it, but I'm not really interested in
discussing the main subject of this thread.
Tim Rentsch
2016-10-15 20:07:03 UTC
Permalink
Raw Message
Post by Wojtek Lerch
...
Post by s***@casperkitty.com
Post by Wojtek Lerch
Post by s***@casperkitty.com
or if an unsigned value within the range of "int" were
dereferenced as "int",
No, the Standard does not guarantee that.
Why not? It's one of the cases that is expressly permitted by the
aliasing rules.
It's one of the cases that is expressly listed as an exception to
the rules that forbid accessing an object using a different type,
yes; but that does not mean that all such accesses automatically
have defined behaviour. They're still subject to other rules,
such as the one that forbids accessing a trap representation in an
object. And the Standard's does not promise that *all* unsigned
representations of an unsigned value less than or equal to INT_MAX
are also valid signed representations.
I agree that what you say matches the letter of the Standard's
rules here. But surely what was intended is for the relationship
to be symmetric, wouldn't you agree? Considering other passages
in the Standard where signed and unsigned types are guaranteed to
be interchangeable if the value is representable in both types,
it seems clear that the phrasing used in 6.2.6.2 is just an
oversight and was meant to imply guarantees in both directions.
Wojtek Lerch
2016-10-16 03:58:33 UTC
Permalink
Raw Message
Post by Tim Rentsch
[...] And the Standard's does not promise that *all* unsigned
representations of an unsigned value less than or equal to INT_MAX
are also valid signed representations.
I agree that what you say matches the letter of the Standard's
rules here. But surely what was intended is for the relationship
to be symmetric, wouldn't you agree?
No, my understanding has been that the asymmetry was intentional. I
don't remember if it was because of some explanation I read here or
purely a guess; but the example that I've had in my head is a
hypothetical processor whose unsigned arithmetic completely ignores the
sign bit of operands and produce results with the sign bit unset, but
bitwise operations work on all the bits. An implementation for such
hardware might prefer to avoid having to generate an extra opcode to
clear the sign bit whenever it stores the result of an unsigned bitwise
operation, so it would treat all possible bit patterns as valid unsigned
representations regardless of the state of the sign bit. (Of course,
the sign bit would be considered a padding bit by C.) In other words,
any unsigned value would have two representation in this implementation
-- one identical to the signed representation, and the other one with
the sign bit set to 1.
Post by Tim Rentsch
Considering other passages
in the Standard where signed and unsigned types are guaranteed to
be interchangeable if the value is representable in both types,
Yes, that's an interesting inconsistency, but off the top of my head I
don't recall any such passages in the normative text -- aren't they all
footnotes?
Post by Tim Rentsch
it seems clear that the phrasing used in 6.2.6.2 is just an
oversight and was meant to imply guarantees in both directions.
I agree that there seems to be an oversight but it's not clear to me
where the mistake is. It doesn't seem totally implausible to me that
6.2.6.2 was written carefully, to allow implementations like the one I
described above, but then the footnotes about interchangeability were
written without enough thought being given to those strange implementations.
Tim Rentsch
2016-10-16 06:58:02 UTC
Permalink
Raw Message
Post by Wojtek Lerch
Post by Tim Rentsch
[...] And the Standard's does not promise that *all* unsigned
representations of an unsigned value less than or equal to INT_MAX
are also valid signed representations.
I agree that what you say matches the letter of the Standard's
rules here. But surely what was intended is for the relationship
to be symmetric, wouldn't you agree?
No, my understanding has been that the asymmetry was intentional. I
don't remember if it was because of some explanation I read here or
purely a guess; but the example that I've had in my head is a
hypothetical processor whose unsigned arithmetic completely ignores
the sign bit of operands and produce results with the sign bit unset,
but bitwise operations work on all the bits. An implementation for
such hardware might prefer to avoid having to generate an extra opcode
to clear the sign bit whenever it stores the result of an unsigned
bitwise operation, so it would treat all possible bit patterns as
valid unsigned representations regardless of the state of the sign
bit. (Of course, the sign bit would be considered a padding bit by
C.) In other words, any unsigned value would have two representation
in this implementation -- one identical to the signed representation,
and the other one with the sign bit set to 1.
Your example looks like it is on point. I'm not sure though how
likely it is, and I don't remember ever seeing or hearing it
before.

Looking at the history reinforces my view that a symmetric
relationship is what was intended. In C90 there was no text
corresponding to 6.2.6.2 p2, and the only other pertinent passage
(6.2.5 p9, which was present in C90 but the numbering scheme was
a bit different) expresses a symmetric relationship. If that
were meant to be changed, from symmetric to non-symmetric, in
C99, that would constitute significant change between the two
versions. (I say it is significant because behavior that
previously was defined in C90 could become undefined in C99.)
Any such change I would normally expect would be mentioned in
the Rationale document, but AFAICT no such mention occurs.
Post by Wojtek Lerch
Post by Tim Rentsch
Considering other passages
in the Standard where signed and unsigned types are guaranteed to
be interchangeable if the value is representable in both types,
Yes, that's an interesting inconsistency, but off the top of my head I
don't recall any such passages in the normative text -- aren't they
all footnotes?
Actually I had forgotten about the footnotes. I was thinking
of the requirements for what types are allowed for certain
function calls - 6.5.2.2 p6, and what is more or less the
same thing for va_arg() in <stdarg.h>. Both of these are in
normative text. To keep a certain interested reader happy,
let me quote a portion of 6.5.2.2 p6:

If the function is defined with a type that does not include
a prototype, and the types of the arguments after promotion
are not compatible with those of the parameters after
promotion, the behavior is undefined, except for the
following cases:

- one promoted type is a signed integer type, the other
promoted type is the corresponding unsigned integer
type, and the value is representable in both types;

- [..the other case is void * and character pointers..]
Post by Wojtek Lerch
Post by Tim Rentsch
it seems clear that the phrasing used in 6.2.6.2 is just an
oversight and was meant to imply guarantees in both directions.
I agree that there seems to be an oversight but it's not clear to me
where the mistake is. It doesn't seem totally implausible to me that
6.2.6.2 was written carefully, to allow implementations like the one I
described above, but then the footnotes about interchangeability were
written without enough thought being given to those strange
implementations.
I agree with you: it isn't totally implausible. Moreover the
footnotes were written before (I believe) in C90, but the
text in question in 6.2.6.2 was added in C99.

On the other hand, the normative passages I mentioned above
were also added in C99. Draw your own conclusions.
Wojtek Lerch
2016-10-17 14:21:24 UTC
Permalink
Raw Message
Post by Wojtek Lerch
Post by Tim Rentsch
[...] And the Standard's does not promise that *all* unsigned
representations of an unsigned value less than or equal to INT_MAX
are also valid signed representations.
...
Post by Wojtek Lerch
Post by Tim Rentsch
Considering other passages
in the Standard where signed and unsigned types are guaranteed to
be interchangeable if the value is representable in both types,
[...] I was thinking
of the requirements for what types are allowed for certain
function calls - 6.5.2.2 p6, and what is more or less the
same thing for va_arg() in <stdarg.h>. Both of these are in
normative text. To keep a certain interested reader happy,
If the function is defined with a type that does not include
a prototype, and the types of the arguments after promotion
are not compatible with those of the parameters after
promotion, the behavior is undefined, except for the
- one promoted type is a signed integer type, the other
promoted type is the corresponding unsigned integer
type, and the value is representable in both types;
...

That just says that function arguments in those contexts need to be
converted to their promoted type and, if that type is unsigned and the
value fits into the corresponding signed type, a representation that is
valid for the signed type must be used. Arguing that it also forbids
other representations to exist feels like a big stretch to me.
Post by Wojtek Lerch
Post by Tim Rentsch
it seems clear that the phrasing used in 6.2.6.2 is just an
oversight and was meant to imply guarantees in both directions.
I agree that there seems to be an oversight but it's not clear to me
where the mistake is. It doesn't seem totally implausible to me that
6.2.6.2 was written carefully, to allow implementations like the one I
described above, but then the footnotes about interchangeability were
written without enough thought being given to those strange
implementations.
I agree with you: it isn't totally implausible. Moreover the
footnotes were written before (I believe) in C90, but the
text in question in 6.2.6.2 was added in C99.
On the other hand, the normative passages I mentioned above
were also added in C99. Draw your own conclusions.
The conclusion I drew a long time ago is that I'll have to live without
ever learning the answers to some of the questions I have. The
Standard, like any other non-trivial document authored by humans without
access to unlimited resources, has to be expected to contain some
mistakes and oversights. In the real life, the goal is to try to keep
their number and level of seriousness as low as practically possible,
not to make the document absolutely perfect at all cost. As far as my
practical needs are concerned, I feel that that goal has been reached;
diving into deep dark corners like this is just a hobby to me.
Tim Rentsch
2016-10-17 17:17:56 UTC
Permalink
Raw Message
Post by Wojtek Lerch
Post by Wojtek Lerch
Post by Tim Rentsch
[...] And the Standard's does not promise that *all* unsigned
representations of an unsigned value less than or equal to INT_MAX
are also valid signed representations.
...
Post by Wojtek Lerch
Post by Tim Rentsch
Considering other passages
in the Standard where signed and unsigned types are guaranteed to
be interchangeable if the value is representable in both types,
[...] I was thinking
of the requirements for what types are allowed for certain
function calls - 6.5.2.2 p6, and what is more or less the
same thing for va_arg() in <stdarg.h>. Both of these are in
normative text. To keep a certain interested reader happy,
If the function is defined with a type that does not include
a prototype, and the types of the arguments after promotion
are not compatible with those of the parameters after
promotion, the behavior is undefined, except for the
- one promoted type is a signed integer type, the other
promoted type is the corresponding unsigned integer
type, and the value is representable in both types;
...
That just says that function arguments in those contexts need to be
converted to their promoted type and, if that type is unsigned and the
value fits into the corresponding signed type, a representation that
is valid for the signed type must be used. Arguing that it also
forbids other representations to exist feels like a big stretch to me.
Oh, I'm not making that argument. My implied argument is
that the presence and phrasing of this part of 6.5.2.2 p6
supports the view that 6.2.6.2 p2 was intended to express
a symmetric relationship.
Post by Wojtek Lerch
Post by Wojtek Lerch
Post by Tim Rentsch
it seems clear that the phrasing used in 6.2.6.2 is just an
oversight and was meant to imply guarantees in both directions.
I agree that there seems to be an oversight but it's not clear to me
where the mistake is. It doesn't seem totally implausible to me that
6.2.6.2 was written carefully, to allow implementations like the one I
described above, but then the footnotes about interchangeability were
written without enough thought being given to those strange
implementations.
I agree with you: it isn't totally implausible. Moreover the
footnotes were written before (I believe) in C90, but the
text in question in 6.2.6.2 was added in C99.
On the other hand, the normative passages I mentioned above
were also added in C99. Draw your own conclusions.
The conclusion I drew a long time ago is that I'll have to live
without ever learning the answers to some of the questions I have.
The Standard, like any other non-trivial document authored by humans
without access to unlimited resources, has to be expected to contain
some mistakes and oversights. In the real life, the goal is to try to
keep their number and level of seriousness as low as practically
possible, not to make the document absolutely perfect at all cost. As
far as my practical needs are concerned, I feel that that goal has
been reached; diving into deep dark corners like this is just a hobby
to me.
I think you may have misunderstood my motivation here. I'm not
trying to settle the question, or convince you of my point of
view. I don't really care about either of those things. I am
trying to understand your point of view, and help you understand
my point of view and (at least some of) the reasons I have it.
If you acknowledge (as you seem to be doing) that the question is
still potentially open, I'm happy to leave it there and go no
further. That okay with you?
Wojtek Lerch
2016-10-19 02:19:11 UTC
Permalink
Raw Message
Post by Tim Rentsch
Post by Wojtek Lerch
Post by Tim Rentsch
[...] And the Standard's does not promise that *all* unsigned
representations of an unsigned value less than or equal to INT_MAX
are also valid signed representations.
...
Post by Tim Rentsch
Considering other passages
in the Standard where signed and unsigned types are guaranteed to
be interchangeable if the value is representable in both types,
[...] I was thinking
of the requirements for what types are allowed for certain
function calls - 6.5.2.2 p6, and what is more or less the
same thing for va_arg() in <stdarg.h>.
...
That just says that function arguments in those contexts need to be
converted to their promoted type and, if that type is unsigned and the
value fits into the corresponding signed type, a representation that
is valid for the signed type must be used. Arguing that it also
forbids other representations to exist feels like a big stretch to me.
Oh, I'm not making that argument. My implied argument is
that the presence and phrasing of this part of 6.5.2.2 p6
supports the view that 6.2.6.2 p2 was intended to express
a symmetric relationship.
I'd say that it's consistent with either view, and leave it at that.
Post by Tim Rentsch
[...] I'm not
trying to settle the question, or convince you of my point of
view. I don't really care about either of those things. I am
trying to understand your point of view, and help you understand
my point of view and (at least some of) the reasons I have it.
Same here, apologies for not making that clearer.
Post by Tim Rentsch
If you acknowledge (as you seem to be doing) that the question is
still potentially open, I'm happy to leave it there and go no
further. That okay with you?
Absolutely.
Tim Rentsch
2016-10-19 21:47:41 UTC
Permalink
Raw Message
Post by Wojtek Lerch
Post by Tim Rentsch
[...] And the Standard's does not promise that *all* unsigned
representations of an unsigned value less than or equal to
INT_MAX are also valid signed representations.
[...]
I've thought about this a little more now, and have arrived at a
slightly different destination. Let me stress that the comments
that follow are meant only to explain the new point of view, and
are not intended to constitute any sort of argument or persuasive
rhetoric.

The short summary is this: the descriptions in 6.2.6.2 p2 and p5
are not intended to supersede the requirements in 6.2.5 p9, but
rather to augment and clarify them. The statements in 6.2.5 p9
are vague with respect to alternate representations and negative
zeros. The purpose (or a purpose) of the aforementioned passages
in 6.2.6.2 is to clarify that vagueness.

However, adding these clarifying remarks doesn't remove, negate,
eliminate or contradict the statements in 6.2.5 p9; it only adds
to them. Because of that I conclude that any operation on
unsigned types -- and whose resulting value is within the range
of the corresponding signed type -- must yield a value that, if
stored in an object of the unsigned type, produces an object
representation that would have the same value if read as the
signed type.

It still may be the case that there are object representations
that, if read as the unsigned type, yield a value that is in the
common set of values, but will not be that value if read as the
signed type. But, operations on an unsigned type will never
produce such object representations, and that is the point of
the "same representation" clause in 6.2.5 p9.
Post by Wojtek Lerch
Post by Tim Rentsch
If you acknowledge (as you seem to be doing) that the question is
still potentially open, I'm happy to leave it there and go no
further. That okay with you?
Absolutely.
I will just repeat my view that the question is still potentially
open (and it is one that deserves clarification). Thank you for
your attentions.
Tim Rentsch
2016-10-15 19:57:18 UTC
Permalink
Raw Message
Post by Kaz Kylheku
Post by s***@casperkitty.com
Post by Kaz Kylheku
C had the concept of indeterminately-valued objects before the "trap
representation" concept and terminology was introduced in C99. It's not
trap representations that make uninitialized objects
"indeterminately-valued"; even types that have a total mapping from all
possible bitwise images to values can be indeterminately-valued.
That may be, but what would be the purpose of the language about structures
not having trap representations, if not to avoid requiring code to clear out
irrelevant parts of a structure?
I don't see the purpose of "trap representation" in the first place.
It's semantic fluff without which the C language got along just fine.
I will offer a different opinion on this question.

The approach C90 takes in this area is both unclear and hard to
give an accurate semantic model for. Any defining document for a
formal language, such as the ISO C standard, should be written
well enough so that a formal semantics can be defined without too
much difficulty, and there is no uncertainty about whether the
formal semantics matches what the language document specified.
C90 fails both of these tests in the area of indeterminate
values.

By contrast, the approach C99 takes, with valid values and trap
representations, is clear-cut, easy to understand, and allows a
formal semantics to be simply and easily defined.

After C99 was published, WG14 backtracked a bit and tried to make
"indeterminate values" mean something different than what they
are defined as in C99 (eg, the infamous DR260, if I remember that
number correctly). This backtracking was (and is) horribly
misguided, making the semantics both unclear and hard to give a
formal definition for. The original C99 description was much
cleaner.

In any case, whether you like or dislike trap representations,
and defining indeterminate values as functions of the object
representation of a stored value, it is how ISO C is currently
defined, so that is the operative guideline.

As far as structs are concerned, it seems obvious that the
intention is that structs can be assigned or passed as arguments
regardless of whether they have been given any initial value,
either wholly or partially, and also without being affected by
the trap-representation-ness of any member. This aspect of
struct behavior mirrors how unsigned char behaves.

Later, in C11, an unitialized object of automatic storage
duration (I think I have this right) produces undefined
behavior under certain circumstances if it is read before
being given an initial value. The fact that this rule was
added in C11 makes it clearly evident that under pre-C11
rules accessing an uninitialized object is not necessarily
undefined behavior (nor is it post-C11 for those cases that
fall outside the specific circumstances given in C11).

Given the above statements, it may be worth asking the
question of whether your semantic model for C matches
how the language is currently defined.
s***@casperkitty.com
2016-10-17 15:39:46 UTC
Permalink
Raw Message
Post by Tim Rentsch
The approach C90 takes in this area is both unclear and hard to
give an accurate semantic model for. Any defining document for a
formal language, such as the ISO C standard, should be written
well enough so that a formal semantics can be defined without too
much difficulty, and there is no uncertainty about whether the
formal semantics matches what the language document specified.
C90 fails both of these tests in the area of indeterminate
values.
Given the variations among C implementations, the tasks they were called
upon to perform, and the existing programs that were being used to perform
those tasks, there is no way the authors of C89 (or any subsequent Standard)
could have specified a single language which could efficiently perform all
of the tasks that were being done with C, while being supportable on all
platforms that were being used to run C.

Being able to have a standard way of performing those tasks which can be
accomplished on all implementations is useful, but I don't think that the
authors of C89 can reasonably have intended that it be regarded as a
complete spec for how practical implementations should behave.

It may be reasonable to say that signed and unsigned values may have
different but overlapping requirements for padding bits provided that
certain operations force padding bits to a mutually-acceptable form,
but it is patently unreasonable to suggest that any quality implement-
ation should do such a thing without an extremely compelling reason
beyond "because the Standard allows it". Being able to treat signed
and unsigned values in signedness-agnostic fashion is useful, and while
just about *any* imaginable rule could allow optimization in some cases
that would otherwise not be possible, that does not mean that compilers
for platforms that can cheaply support useful behavior should go out of
their way to make such behaviors useless 100% of the time in the interest
of pursuing optimizations that might be useful 1% of the time, if that.
Tim Rentsch
2016-10-17 17:26:14 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
Post by Tim Rentsch
The approach C90 takes in this area is both unclear and hard to
give an accurate semantic model for. Any defining document for a
formal language, such as the ISO C standard, should be written
well enough so that a formal semantics can be defined without too
much difficulty, and there is no uncertainty about whether the
formal semantics matches what the language document specified.
C90 fails both of these tests in the area of indeterminate
values.
Given the variations among C implementations, the tasks they were
called upon to perform, and the existing programs that were being
used to perform those tasks, there is no way the authors of C89
(or any subsequent Standard) could have specified a single
language which could efficiently perform all of the tasks that
were being done with C, while being supportable on all platforms
that were being used to run C.
That is irrelevant to my point.
Post by s***@casperkitty.com
Being able to have a standard way of performing those tasks which
can be accomplished on all implementations is useful, but I don't
think that the authors of C89 can reasonably have intended that it
be regarded as a complete spec for how practical implementations
should behave.
They didn't, because it wasn't what they were trying to do.
I'm surprised you still don't understand that.
s***@casperkitty.com
2016-10-17 18:12:17 UTC
Permalink
Raw Message
Post by Tim Rentsch
Post by s***@casperkitty.com
Being able to have a standard way of performing those tasks which
can be accomplished on all implementations is useful, but I don't
think that the authors of C89 can reasonably have intended that it
be regarded as a complete spec for how practical implementations
should behave.
They didn't, because it wasn't what they were trying to do.
I'm surprised you still don't understand that.
I say I don't believe the authors of the Standard were trying to do X.
You say they weren't trying to do X. That sounds like you're saying I
do understand the intention of the authors of the Standard.

What I don't understand is why you would refer to C Standard as a "defining
document for a formal language" if you recognize that it was not intended
as a complete description of all the features a quality implementation for
a typical platform should make available to programmers.
Tim Rentsch
2016-10-18 14:23:18 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
Post by Tim Rentsch
Post by s***@casperkitty.com
Being able to have a standard way of performing those tasks which
can be accomplished on all implementations is useful, but I don't
think that the authors of C89 can reasonably have intended that it
be regarded as a complete spec for how practical implementations
should behave.
They didn't, because it wasn't what they were trying to do.
I'm surprised you still don't understand that.
I say I don't believe the authors of the Standard were trying to
do X. You say they weren't trying to do X. That sounds like
you're saying I do understand the intention of the authors of the
Standard.
I'm not. What you said is about what they do _not_ intend. That
doesn't tell me anything about what you think they _do_ intend.
If anything it makes me think you're confused for bringing up
something so off the mark.
Post by s***@casperkitty.com
What I don't understand is why you would refer to C Standard as a
"defining document for a formal language" if you recognize that it
was not intended as a complete description of all the features a
quality implementation for a typical platform should make
available to programmers.
I believe you. The point of giving a formal semantics is to
divorce the definition of program behavior from any and all
implementations.
s***@casperkitty.com
2016-10-18 15:18:35 UTC
Permalink
Raw Message
Post by Tim Rentsch
I believe you. The point of giving a formal semantics is to
divorce the definition of program behavior from any and all
implementations.
Which would have been a more useful thing for a Committee to do in 1989:

1. Write a spec which allows a small subset of the tasks people were doing
with C to be done with programs that will run on any imaginable
implementation whose limitations don't cause it to invoke UB (bearing
in mind that bombing the stack when given anything other than a contrived
and useless program would be an acceptable limitation).

2. Write a spec that seeks to augment the existing C dialects that had
emerged with a guaranteed core subset, while respecting the judgment of
implementers with regard to what additional features and guarantees
should be supported on platforms where they are practical.

I see no indication that the authors of the Standard intended that features
and guarantees that had been supported by commonplace platforms should not
continue to be available by quality implementations for those platforms.
Richard Bos
2016-10-18 16:56:49 UTC
Permalink
Raw Message
Post by Tim Rentsch
I believe you. The point of giving a formal semantics is to
divorce the definition of program behavior from any and all
implementations.
Straw man, imagination run wild, and a lack of actual knowledge.

If you want to know such things, don't speculate, _ask a member_.

Richard
s***@casperkitty.com
2016-10-18 19:01:42 UTC
Permalink
Raw Message
Post by Richard Bos
Post by Tim Rentsch
I believe you. The point of giving a formal semantics is to
divorce the definition of program behavior from any and all
implementations.
Straw man, imagination run wild, and a lack of actual knowledge.
Why do you assume that low-level languages should be divorced from their
target *platforms*, or that the authors of the Standard intended that they
should? One of the things that had made C useful for systems programming
was that implementations' behaviors were generally coupled usefully and
predictably to low-level platform behaviors. A language which is divorced
from its target platform *isn't* really a "low-level" language--at least
not a good one.

Divorcing program behavior from the *compiler* is a good thing, and for
years compiler writers read the Standard in such a way as to help achieve
that. The less tightly compilers couple their behaviors to the underlying
platform, however, the greater the extent to which programs that need to
exploit platform-specific behaviors will end up being attached to particular
compilers.
Richard Bos
2016-10-23 09:05:26 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
Post by Richard Bos
Post by Tim Rentsch
I believe you. The point of giving a formal semantics is to
divorce the definition of program behavior from any and all
implementations.
Straw man, imagination run wild, and a lack of actual knowledge.
Why do you assume that low-level languages should be divorced from their
target *platforms*, or that the authors of the Standard intended that they
should?
Why do you assume that I, or the Standard committee, agree with your
view on what C is or should be?

Richard
s***@casperkitty.com
2016-10-24 18:26:04 UTC
Permalink
Raw Message
Post by Richard Bos
Post by s***@casperkitty.com
Why do you assume that low-level languages should be divorced from their
target *platforms*, or that the authors of the Standard intended that they
should?
Why do you assume that I, or the Standard committee, agree with your
view on what C is or should be?
A lot of existing code, including nearly all code that targets
freestanding implementations, was and is very much reliant upon
features which are supported by the target platform. The authors
of the Standard said they wanted to avoid breaking existing
programs. If many programs for platforms with a particular feature
make use of that feature, the way to avoid breaking such programs
is to have implementations for the platforms that have historically
that feature, continue to do so. For the authors of the Standard
to have intended that the lack of a mandated behavior should be
taken as encouraging implementations to revoke support would imply
that they intended that the lack of mandate should cause existing
code to be broken--directly contradicting their claimed intent.

Tim Rentsch
2016-10-19 08:10:07 UTC
Permalink
Raw Message
Post by Tim Rentsch
I believe you. The point of giving a formal semantics is to
divorce the definition of program behavior from any and all
implementations.
[...]
You have a bad habit of jumping into the middle of a thread and
changing the subject from what is being talked about to something
else that /you/ want to talk about. Stop it.
s***@casperkitty.com
2016-10-19 18:19:54 UTC
Permalink
Raw Message
Post by Tim Rentsch
Post by Tim Rentsch
I believe you. The point of giving a formal semantics is to
divorce the definition of program behavior from any and all
implementations.
[...]
You have a bad habit of jumping into the middle of a thread and
changing the subject from what is being talked about to something
else that /you/ want to talk about. Stop it.
You say the point of a formal standard is to divorce the definition of
program behavior from implementations. I respectfully but emphatically
disagree. The set of tasks that could be efficiently and usefully
performed on every platform where C is used is a very small subset of
the tasks which can be and are performed usefully using code which
makes use of at least some features that may be unsupportable on some
C target platforms. To divorce programs from implementations would
requiring abandon most of the tasks for which C has proven itself useful.
Are you saying that's the purpose of the Standard?
Kaz Kylheku
2016-10-19 18:47:43 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
Post by Tim Rentsch
Post by Tim Rentsch
I believe you. The point of giving a formal semantics is to
divorce the definition of program behavior from any and all
implementations.
[...]
You have a bad habit of jumping into the middle of a thread and
changing the subject from what is being talked about to something
else that /you/ want to talk about. Stop it.
You say the point of a formal standard is to divorce the definition of
program behavior from implementations. I respectfully but emphatically
disagree. The set of tasks that could be efficiently and usefully
performed on every platform where C is used is a very small subset of
the tasks which can be and are performed usefully using code which
makes use of at least some features that may be unsupportable on some
C target platforms. To divorce programs from implementations would
requiring abandon most of the tasks for which C has proven itself useful.
Yes; ISO C abandons the tasks for which only non-standard C relying
on extensions is useful.
Post by s***@casperkitty.com
Are you saying that's the purpose of the Standard?
Basically yes. The number one purpose for the standard is portability:
to pin down a set of requirements which add up to a portable dialect
that allows "polyglot programs": programs which are written such that
they "say" the same thing in each C dialect accepted by a conforming
implementation.

ISO C gives requirements that allow the development of code that
runs on every platform where ISO C is used.

Not all code is "a program". Code can usefully be written which is just
a program module. Even if an entire program cannot be portable, some of
its translation units can be.

Some programs are "portable" not in the sense that they run unmodifiede
on many platforms, but that only a small percentage of their code base
(sometimes very well delineated) needs to be written separately for each
platform. The set of such useful programs is considerable, compared
to the set of programs that are portable in a narrower, stronger sense.

Nonportable C itself still may use many ISO C features that are
portable, in and of themselves. This is useful; the developer, and
all subsequent those who maintain or read the code don't have to learn a
whole new language just to use a few platform features.

There is the concept of a "conforming extension": a well-behaved
language extension which doesn't break any portable constructs so that
portable code doesn't work.

Even though ISO C doesn't define the behavior of extensions, of course,
it does define what these extensions must not do when they are not
invoked by the program, which is useful.
--
TXR Programming Lanuage: http://nongnu.org/txr
Music DIY Mailing List: http://www.kylheku.com/diy
ADA MP-1 Mailing List: http://www.kylheku.com/mp1
s***@casperkitty.com
2016-10-19 20:00:18 UTC
Permalink
Raw Message
Post by Kaz Kylheku
Post by s***@casperkitty.com
Are you saying that's the purpose of the Standard?
to pin down a set of requirements which add up to a portable dialect
that allows "polyglot programs": programs which are written such that
they "say" the same thing in each C dialect accepted by a conforming
implementation.
A few tasks can be done efficiently on all platforms. A much larger
set of tasks can be done efficiently using features that 99% of
platforms could support at essentially zero cost (and which were
historically supported almost universally by implementations for those
platforms). Should a program which would work on almost any
implementation which does not go out of its way to *not* support it
be considered no more portable than one which relies upon an extension
which is only supported by one compiler in the entire universe? The
text of the Standard makes no distinction, but does that mean that
compiler writers and programmers shouldn't do so?
Wojtek Lerch
2016-10-19 20:17:34 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
Should a program which would work on almost any
implementation which does not go out of its way to *not* support it
be considered no more portable than one which relies upon an extension
which is only supported by one compiler in the entire universe? The
text of the Standard makes no distinction, but does that mean that
compiler writers and programmers shouldn't do so?
No, it just means they shouldn't do it here.
Keith Thompson
2016-10-19 20:39:04 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
Post by Kaz Kylheku
Post by s***@casperkitty.com
Are you saying that's the purpose of the Standard?
to pin down a set of requirements which add up to a portable dialect
that allows "polyglot programs": programs which are written such that
they "say" the same thing in each C dialect accepted by a conforming
implementation.
A few tasks can be done efficiently on all platforms. A much larger
set of tasks can be done efficiently using features that 99% of
platforms could support at essentially zero cost (and which were
historically supported almost universally by implementations for those
platforms). Should a program which would work on almost any
implementation which does not go out of its way to *not* support it
be considered no more portable than one which relies upon an extension
which is only supported by one compiler in the entire universe? The
text of the Standard makes no distinction, but does that mean that
compiler writers and programmers shouldn't do so?
The standard specifies what compiler writers and programmers *must* do
to conform to the standard. It doesn't, in most cases, specify what
they *should* do.

As you say, the text of the standard makes no distinction. The text of
the standard is precisely what we're talking about here.
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
s***@casperkitty.com
2016-10-19 21:48:36 UTC
Permalink
Raw Message
Post by Keith Thompson
As you say, the text of the standard makes no distinction. The text of
the standard is precisely what we're talking about here.
In that case, let me ask you this: if the Standard fails to specify
something (e.g. that the sequence of byte values in the destination
storage after a memcpy will match the sequence of byte values in the
source) but anyone who isn't being deliberately obtuse will know what
the authors of the Standard meant, should anyone need to care about
whether the text of the Standard would allow an implementation to do
something else?

If everyone who read the standard recognized that no sane implementation
would do anything contrary to the author's intent, and if all insane
implementations were recognized as such, the fact that the Standard would
allow for insane-but-conforming implementations to behave in contrary
fashion wouldn't matter, because the Standard makes no effort to brand all
insane implementations as non-conforming.

I suppose the real problem is that some people think that the authors of
the Standard were list all the requirements for sane implementations.
Keith Thompson
2016-10-19 22:34:11 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
Post by Keith Thompson
As you say, the text of the standard makes no distinction. The text of
the standard is precisely what we're talking about here.
In that case, let me ask you this: if the Standard fails to specify
something (e.g. that the sequence of byte values in the destination
storage after a memcpy will match the sequence of byte values in the
source) but anyone who isn't being deliberately obtuse will know what
the authors of the Standard meant, should anyone need to care about
whether the text of the Standard would allow an implementation to do
something else?
Yes.

I care because I want the standard to be as clear and unambiguous as it
reasonably can be, and because no two people will necessarily agree on
what's obvious.

[...]
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
s***@casperkitty.com
2016-10-19 23:28:11 UTC
Permalink
Raw Message
Post by Keith Thompson
Post by s***@casperkitty.com
In that case, let me ask you this: if the Standard fails to specify
something (e.g. that the sequence of byte values in the destination
storage after a memcpy will match the sequence of byte values in the
source) but anyone who isn't being deliberately obtuse will know what
the authors of the Standard meant, should anyone need to care about
whether the text of the Standard would allow an implementation to do
something else?
Yes.
I care because I want the standard to be as clear and unambiguous as it
reasonably can be, and because no two people will necessarily agree on
what's obvious.
Would you consequently consider the silence of the specification with regard
to the arrangement of byte values in memcpy's destination to be a defect?
I do think it would have been helpful to have written the spec in such a way
as to specify that, and also specify that implementations are required to,
at their option, either ignore the pointers entirely when the length is zero
or define a macro indicating that they might not do so. That would allow
any existing code which relies upon such behavior to be adjusted so as to
ensure correct operation on any implementation, without any loss of
efficiency on the present implementation or others that offer the same
guarantee.
Keith Thompson
2016-10-20 00:18:26 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
Post by Keith Thompson
Post by s***@casperkitty.com
In that case, let me ask you this: if the Standard fails to specify
something (e.g. that the sequence of byte values in the destination
storage after a memcpy will match the sequence of byte values in the
source) but anyone who isn't being deliberately obtuse will know what
the authors of the Standard meant, should anyone need to care about
whether the text of the Standard would allow an implementation to do
something else?
Yes.
I care because I want the standard to be as clear and unambiguous as it
reasonably can be, and because no two people will necessarily agree on
what's obvious.
Would you consequently consider the silence of the specification with regard
to the arrangement of byte values in memcpy's destination to be a defect?
Perhaps a minor one. The standard is not, and is not meant to be, a
completely rigorous mathematical specification of the language. *Some*
common sense is required. If a future edition specifies the order in
which the bytes are copied, great. If not, I doubt that any implementer
would seriously consider changing the order, or that any programmer
would expect the order to be any other than the obvious one.
Post by s***@casperkitty.com
I do think it would have been helpful to have written the spec in such a way
as to specify that, and also specify that implementations are required to,
at their option, either ignore the pointers entirely when the length is zero
or define a macro indicating that they might not do so. That would allow
any existing code which relies upon such behavior to be adjusted so as to
ensure correct operation on any implementation, without any loss of
efficiency on the present implementation or others that offer the same
guarantee.
Adding a standard macro would IMHO be overkill for such a minor issue.
I'm happy either leaving it as it is, or requiring memcpy(s1, s2, 0) to
be a no-op regardless of the values of s1 and s2. I don't see the value
of partially defining the behavior.
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
s***@casperkitty.com
2016-10-20 03:46:00 UTC
Permalink
Raw Message
Post by Keith Thompson
Post by s***@casperkitty.com
Would you consequently consider the silence of the specification with regard
to the arrangement of byte values in memcpy's destination to be a defect?
Perhaps a minor one. The standard is not, and is not meant to be, a
completely rigorous mathematical specification of the language. *Some*
common sense is required. If a future edition specifies the order in
which the bytes are copied, great. If not, I doubt that any implementer
would seriously consider changing the order, or that any programmer
would expect the order to be any other than the obvious one.
For what reason would one consider common sense more weight there, than in
some cases where the best-case performance boost from treating something as
UB would be far smaller?
Post by Keith Thompson
Post by s***@casperkitty.com
I do think it would have been helpful to have written the spec in such a way
as to specify that, and also specify that implementations are required to,
at their option, either ignore the pointers entirely when the length is zero
or define a macro indicating that they might not do so. That would allow
any existing code which relies upon such behavior to be adjusted so as to
ensure correct operation on any implementation, without any loss of
efficiency on the present implementation or others that offer the same
guarantee.
Adding a standard macro would IMHO be overkill for such a minor issue.
I'm happy either leaving it as it is, or requiring memcpy(s1, s2, 0) to
be a no-op regardless of the values of s1 and s2. I don't see the value
of partially defining the behavior.
I'd rather see the spec simply require the sane behavior, but the authors
of the Standard seem loath to mandate anything which would have a non-zero
run-time cost. By contrast, the only "cost" of the macros would be that
they would likely cause compilers whose behavior would require that they
set the macros to be regarded by the marketplace as inferior to those whose
behavior does not. On the other hand, the ability to test for such macros
would assist those who want judge what behaviors programmers really want.
If a header file which tests for a bunch of abnormal-behavior macros and
issues a #error if any are defined becomes popular, that should lay to rest
any claims that programmers want such behavior.

On the other hand, Annex L "analyzability" is so ill-defined as to be pretty
much meaningless.
Richard Bos
2016-10-23 09:04:40 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
Post by Keith Thompson
As you say, the text of the standard makes no distinction. The text of
the standard is precisely what we're talking about here.
In that case, let me ask you this: if the Standard fails to specify
something (e.g. that the sequence of byte values in the destination
storage after a memcpy will match the sequence of byte values in the
source) but anyone who isn't being deliberately obtuse will know what
the authors of the Standard meant,
Given that "being deliberately obtuse" is supercat-language for
"disagreeing with supercat's distorted view of programming in the 1970s
and 1980s", I don't think the question is relevant.

Richard
Keith Thompson
2016-10-23 19:02:42 UTC
Permalink
Raw Message
Post by Richard Bos
Post by s***@casperkitty.com
Post by Keith Thompson
As you say, the text of the standard makes no distinction. The text of
the standard is precisely what we're talking about here.
In that case, let me ask you this: if the Standard fails to specify
something (e.g. that the sequence of byte values in the destination
storage after a memcpy will match the sequence of byte values in the
source) but anyone who isn't being deliberately obtuse will know what
the authors of the Standard meant,
Given that "being deliberately obtuse" is supercat-language for
"disagreeing with supercat's distorted view of programming in the 1970s
and 1980s", I don't think the question is relevant.
Richard, I suggest you chose a poor example to make this point.

The standard says that memcpy copies characters from one object
to another. It doesn't specify the order in which they're copied.
An implementation of memcpy that makes the target object a reversed
or otherwise scrambled copy of the source object would, unless I'm
missing something, conform to the letter of the standard. In this
particular case, I agree with supercat that one would have to be
deliberately obtuse to believe that such an implementation would
be conforming in any meaningful sense.
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Tim Rentsch
2016-10-19 21:49:24 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
Post by Tim Rentsch
Post by Tim Rentsch
I believe you. The point of giving a formal semantics is to
divorce the definition of program behavior from any and all
implementations.
[...]
You have a bad habit of jumping into the middle of a thread and
changing the subject from what is being talked about to something
else that /you/ want to talk about. Stop it.
You say the point of a formal standard is to divorce the definition of
program behavior from implementations. I respectfully but emphatically
disagree. The set of tasks that could be efficiently and usefully
performed on every platform where C is used is a very small subset of
the tasks which can be and are performed usefully using code which
makes use of at least some features that may be unsupportable on some
C target platforms. To divorce programs from implementations would
requiring abandon most of the tasks for which C has proven itself useful.
You are welcome to hold that opinion. But it still isn't
what was being talked about when you jumped in and started
changing the subject.
m***@gmail.com
2016-10-12 10:33:49 UTC
Permalink
Raw Message
Post by Kaz Kylheku
Some structures defined by the Library may have additional
members defined by the implementation. An example of this
is struct tm.
Suppose we instantiate a struct tm in automatic storage
and initialize, by assignment, only the ISO-documented members.
Then we then use it is input to a function like asctime.
Is the behavior defined? If not, from where can we infer
the requirement for initializing the extra members?
(It's not a given that just because member exists, neglecting to
initialize it is a problem. Such members can be "scratch areas" for
internal use, initialized by the implementation prior to any access. Of
they can be outright outputs. In that same struct tm, some Unix
libraries have members tm_gmtoff and tm_zone. These are written by the
implementation when it calculates a broken down time, providing
additional information that programs may access.)
ISO/IEC 9899:201x has added a clause that covers this - 6.7.9.10
--
mailto:***@gmail.com
James Kuyper
2016-10-12 11:06:59 UTC
Permalink
Raw Message
Post by m***@gmail.com
Post by Kaz Kylheku
Some structures defined by the Library may have additional
members defined by the implementation. An example of this
is struct tm.
Suppose we instantiate a struct tm in automatic storage
and initialize, by assignment, only the ISO-documented members.
Then we then use it is input to a function like asctime.
Is the behavior defined? If not, from where can we infer
the requirement for initializing the extra members?
(It's not a given that just because member exists, neglecting to
initialize it is a problem. Such members can be "scratch areas" for
internal use, initialized by the implementation prior to any access. Of
they can be outright outputs. In that same struct tm, some Unix
libraries have members tm_gmtoff and tm_zone. These are written by the
implementation when it calculates a broken down time, providing
additional information that programs may access.)
ISO/IEC 9899:201x has added a clause that covers this - 6.7.9.10
That clause specifies that "If an object that has static or thread
storage duration is not initialized explicitly, then:" it is implicitly
zero-initialized.

That feature was not added in C2011; it dates back to the earliest
versions of C, even before the first edition of the standard. It is one
way you can arrange for members of a struct whose names are not known to
you to be zero-initialized. The other way is to rely upon 6.7.9p20: when
initializers are explicitly provided for some but not all of the members
of an aggregate (struct or array), then "the remainder of the aggregate
shall be initialized implicitly the same as objects that have static
storage duration".

That does not address Kaz's question, which is whether or not those
unknown members need to be initialized.

Kaz's message doesn't mention relying upon default zero-initialization
as an option - but if he was unaware of that possibility, there would be
no reason for his question. Without implicit zero-initialization, there
would be no way to initialize members whose names are not known, so
there would be no way that it would be necessary to do so in order to
pass such structs to the appropriate C standard library functions.
m***@gmail.com
2016-10-12 12:58:00 UTC
Permalink
Raw Message
Post by James Kuyper
Post by m***@gmail.com
Post by Kaz Kylheku
Some structures defined by the Library may have additional
members defined by the implementation. An example of this
is struct tm.
Suppose we instantiate a struct tm in automatic storage
and initialize, by assignment, only the ISO-documented members.
Then we then use it is input to a function like asctime.
Is the behavior defined? If not, from where can we infer
the requirement for initializing the extra members?
(It's not a given that just because member exists, neglecting to
initialize it is a problem. Such members can be "scratch areas" for
internal use, initialized by the implementation prior to any access. Of
they can be outright outputs. In that same struct tm, some Unix
libraries have members tm_gmtoff and tm_zone. These are written by the
implementation when it calculates a broken down time, providing
additional information that programs may access.)
ISO/IEC 9899:201x has added a clause that covers this - 6.7.9.10
That clause specifies that "If an object that has static or thread
storage duration is not initialized explicitly, then:" it is implicitly
zero-initialized.
That feature was not added in C2011; it dates back to the earliest
versions of C, even before the first edition of the standard. It is one
way you can arrange for members of a struct whose names are not known to
you to be zero-initialized. The other way is to rely upon 6.7.9p20: when
initializers are explicitly provided for some but not all of the members
of an aggregate (struct or array), then "the remainder of the aggregate
shall be initialized implicitly the same as objects that have static
storage duration".
I was not talking about "this feature", IIUC you are citing second part
of the relevant clause.

What's new in 2011 is the provision to zero initialize the padding bits
the wording was not there prior to that revision of the standard.

cf. https://www.spinics.net/lists/kernel/msg2327455.html for instance.
Post by James Kuyper
That does not address Kaz's question, which is whether or not those
unknown members need to be initialized.
Kaz's message doesn't mention relying upon default zero-initialization
as an option - but if he was unaware of that possibility, there would be
no reason for his question. Without implicit zero-initialization, there
would be no way to initialize members whose names are not known, so
there would be no way that it would be necessary to do so in order to
pass such structs to the appropriate C standard library functions.
Thing is - without padding bits initialization parts of an aggregate
might contain unspecified values
--
mailto:***@gmail.com
James Kuyper
2016-10-12 14:50:10 UTC
Permalink
Raw Message
Post by m***@gmail.com
Post by James Kuyper
Post by m***@gmail.com
Post by Kaz Kylheku
Some structures defined by the Library may have additional
members defined by the implementation. An example of this
is struct tm.
Suppose we instantiate a struct tm in automatic storage
and initialize, by assignment, only the ISO-documented members.
Then we then use it is input to a function like asctime.
Is the behavior defined? If not, from where can we infer
the requirement for initializing the extra members?
(It's not a given that just because member exists, neglecting to
initialize it is a problem. Such members can be "scratch areas" for
internal use, initialized by the implementation prior to any access. Of
they can be outright outputs. In that same struct tm, some Unix
libraries have members tm_gmtoff and tm_zone. These are written by the
implementation when it calculates a broken down time, providing
additional information that programs may access.)
ISO/IEC 9899:201x has added a clause that covers this - 6.7.9.10
That clause specifies that "If an object that has static or thread
storage duration is not initialized explicitly, then:" it is implicitly
zero-initialized.
That feature was not added in C2011; it dates back to the earliest
versions of C, even before the first edition of the standard. It is one
way you can arrange for members of a struct whose names are not known to
you to be zero-initialized. The other way is to rely upon 6.7.9p20: when
initializers are explicitly provided for some but not all of the members
of an aggregate (struct or array), then "the remainder of the aggregate
shall be initialized implicitly the same as objects that have static
storage duration".
I was not talking about "this feature", IIUC you are citing second part
of the relevant clause.
What's new in 2011 is the provision to zero initialize the padding bits
the wording was not there prior to that revision of the standard.
By definition, padding (whether padding bits or padding bytes) have no
affect on the values of any actual members of the struct. Guaranteeing
that those bits are all 0 is useful because it allows you to use
memcmp() calls that look only at the entire structure, including the
padding bits. But
Post by m***@gmail.com
cf. https://www.spinics.net/lists/kernel/msg2327455.html for instance.
Post by James Kuyper
That does not address Kaz's question, which is whether or not those
unknown members need to be initialized.
Kaz's message doesn't mention relying upon default zero-initialization
as an option - but if he was unaware of that possibility, there would be
no reason for his question. Without implicit zero-initialization, there
would be no way to initialize members whose names are not known, so
there would be no way that it would be necessary to do so in order to
pass such structs to the appropriate C standard library functions.
Thing is - without padding bits initialization parts of an aggregate
might contain unspecified values
Padding bits are not part of any member of a struct, and therefore
cannot affect the values of those members. Writing a member of a struct
can cause changes to the padding bits, because it may, for instance, be
convenient to write 32 bits at a time, even though a particular member
is only 16 bits long. However, there's nothing that allows padding bits
to affect the value represented by the a non-padding bits.
s***@casperkitty.com
2016-10-12 16:15:16 UTC
Permalink
Raw Message
Post by James Kuyper
Padding bits are not part of any member of a struct, and therefore
cannot affect the values of those members. Writing a member of a struct
can cause changes to the padding bits, because it may, for instance, be
convenient to write 32 bits at a time, even though a particular member
is only 16 bits long. However, there's nothing that allows padding bits
to affect the value represented by the a non-padding bits.
The possibility that writes to a structure may affect padding bits is the
reason why the Common Initial Sequence rule only talks about "inspecting"
parts of a structure which overlays another structure with the same Common
Initial Sequence. It would be helpful if the Standard specified a means
by which an implementation could promise to uphold a stronger version of
the rule which would guarantee that writing part of a structure won't
disturb anything outside it.

I find it ironic that while gcc's aliasing logic upholds only a version of
the CIS which is so anemic as to be useless, its bitfield logic will avoid
disturbing the padding in a structure (thus extending the "inspecting" part
of the CIS rule to include all access).
Tim Rentsch
2016-10-15 18:38:42 UTC
Permalink
Raw Message
Post by Kaz Kylheku
Some structures defined by the Library may have additional
members defined by the implementation. An example of this
is struct tm.
Suppose we instantiate a struct tm in automatic storage
and initialize, by assignment, only the ISO-documented members.
Then we then use it is input to a function like asctime.
Is the behavior defined? If not, from where can we infer
the requirement for initializing the extra members?
The behavior could be undefined, depending on how the function's
description is worded. For example, asctime()'s description
might be construed that way, up to the point of noticing the
footnote which clearly indicates a different intention.

Another consideration is 7.1.4 p1, which stipulates undefined
behavior if any argument "has an invalid value". This rule could
be construed as prescribing undefined behavior for any struct
that is only partially initialized. I think it's hard to make a
convincing argument that this rule /cannot/ be construed this
way, which could give implementors all the excuse they need to
foul things up in such cases.

If you don't mind my asking, why do you care? It seems like the
safe thing to do is initialize the entire struct, which is easy
to do, and by doing so eliminate any potential problem.
Loading...