Discussion:
DR 260 issue
(too old to reply)
e***@gmail.com
2015-03-31 23:50:23 UTC
Permalink
Raw Message
I don't understand the intended reconciliation of the two following rulings from DR 260 (http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_260.htm):

(1) the representation of a determinate pointer can change at any time, yet
(2) a determinate value can be "bitwise" (i.e., representationally) copied using memmove or memcpy to make an equal value.

How is (2) possible given that memmove and memcopy are presumably operating on the representation, which can change while that representation is being read? Is the intent that memmove or memcpy enjoy special atomicity guarantees that cannot be duplicated by ordinary code?
Francis Glassborow
2015-04-01 09:49:12 UTC
Permalink
Raw Message
Post by e***@gmail.com
(1) the representation of a determinate pointer can change at any time, yet
(2) a determinate value can be "bitwise" (i.e., representationally) copied using memmove or memcpy to make an equal value.
How is (2) possible given that memmove and memcopy are presumably operating on the representation, which can change while that representation is being read? Is the intent that memmove or memcpy enjoy special atomicity guarantees that cannot be duplicated by ordinary code?
Yes, in the case of determinate values with multiple representations we
do have a problem (and it is real because of cases such as normalisation
of pointers) Making change of representation atomic is not enough
because in the context of multiple threads we do not want one thread to
start reading/writing a pointer and another thread to change the
representation.

Clearly any change of representation is a 'write' operation but hidden
from the user. Imagine two threads both reading an un-normalised
pointer. Before the first thread completes reading the second thread
cuts in reads and normalises the pointer, writing the normalised version
back before returning control to the first thread.

The problem is that the user considers the operations as pure reads and
so takes no precautions but in the context of multiple valid
represntations of determinate values read operations are also
potentially write operations.

This looks like a concurrency/multiple threads problem.

Francis
e***@gmail.com
2015-04-01 13:21:08 UTC
Permalink
Raw Message
Presumably, the change of representation could only happen if the compiler could determine (e.g. through escape analysis) that no other thread is concurrently accessing it.

But there is still a problem even in the single threaded case; if you try to copy pointer p to pointer q by

for (size_t i = 0; i < sizeof(int *); ++i)
((char*)&q)[i] = ((char *)&p)[i];

(i.e., roll your own memcpy) you get screwed if the representation changes in the middle of your loop.

The only way I can make sense of DR 260 is if (1) the copying code above is incorrect, and (2) the compiler guarantees that this problem doesn't arise in library routines like memcpy. But is this really what the standards committee is trying to do?
Francis Glassborow
2015-04-01 13:36:38 UTC
Permalink
Raw Message
Post by e***@gmail.com
Presumably, the change of representation could only happen if the compiler could determine (e.g. through escape analysis) that no other thread is concurrently accessing it.
But such a change is a runtime issue because which thread (or threads in
a truely concurrent system) is active is not under compiler control.
Post by e***@gmail.com
But there is still a problem even in the single threaded case; if you try to copy pointer p to pointer q by
for (size_t i = 0; i < sizeof(int *); ++i)
((char*)&q)[i] = ((char *)&p)[i];
(i.e., roll your own memcpy) you get screwed if the representation changes in the middle of your loop.
The only way I can make sense of DR 260 is if (1) the copying code above is incorrect, and (2) the compiler guarantees that this problem doesn't arise in library routines like memcpy. But is this really what the standards committee is trying to do?
Within a single thread the implementation 'knows' that a change of
representation is effectively a write and should be able to ensure that
it does not write to storage that is being read.
e***@gmail.com
2015-04-01 23:57:57 UTC
Permalink
Raw Message
Post by Francis Glassborow
Post by e***@gmail.com
Presumably, the change of representation could only happen if the compiler could determine (e.g. through escape analysis) that no other thread is concurrently accessing it.
But such a change is a runtime issue because which thread (or threads in
a truely concurrent system) is active is not under compiler control.
Post by e***@gmail.com
But there is still a problem even in the single threaded case; if you try to copy pointer p to pointer q by
for (size_t i = 0; i < sizeof(int *); ++i)
((char*)&q)[i] = ((char *)&p)[i];
(i.e., roll your own memcpy) you get screwed if the representation changes in the middle of your loop.
The only way I can make sense of DR 260 is if (1) the copying code above is incorrect, and (2) the compiler guarantees that this problem doesn't arise in library routines like memcpy. But is this really what the standards committee is trying to do?
Within a single thread the implementation 'knows' that a change of
representation is effectively a write and should be able to ensure that
it does not write to storage that is being read.
The standard is not supposed to describe implementation, it's supposed to desribe architecturally visible behavior. If the compiler was required to hide the change of representation, it would not be mentioned in standards discussions at all.
Kaz Kylheku
2015-04-01 14:26:35 UTC
Permalink
Raw Message
Post by Francis Glassborow
Post by e***@gmail.com
(1) the representation of a determinate pointer can change at any time, yet
(2) a determinate value can be "bitwise" (i.e., representationally) copied using memmove or memcpy to make an equal value.
How is (2) possible given that memmove and memcopy are presumably operating
on the representation, which can change while that representation is being
read? Is the intent that memmove or memcpy enjoy special atomicity
guarantees that cannot be duplicated by ordinary code?
Yes, in the case of determinate values with multiple representations we
do have a problem (and it is real because of cases such as normalisation
of pointers) Making change of representation atomic is not enough
because in the context of multiple threads we do not want one thread to
start reading/writing a pointer and another thread to change the
representation.
Clearly any change of representation is a 'write' operation but hidden
from the user.
No, that is not even remotely a reasonable interpretation of the text;
"the representation of a determinate pointer can change at any time".

"representation" is not a synonym for "value stored in an object".

The intent cannot possibly be that read operations can secretly write back, or
that write operations can appear in the translation of code that isn't
even accessing objects at all, so that stored values spontaneously change.

Even if that was the intent, the users of a compiler used for concurrent
programming would scream murder at the vendor and the problem would take care
of itself.

Just because some English sentence in some defect report has an silly
interpretation doesn't mean that it's relevant in the real world, where
there are people to whom the silliness is obvious.

"Say you love me!"

"Okay, 'you love me'."
Francis Glassborow
2015-04-01 16:30:46 UTC
Permalink
Raw Message
Post by Kaz Kylheku
Post by Francis Glassborow
Post by e***@gmail.com
(1) the representation of a determinate pointer can change at any time, yet
(2) a determinate value can be "bitwise" (i.e., representationally) copied using memmove or memcpy to make an equal value.
How is (2) possible given that memmove and memcopy are presumably operating
on the representation, which can change while that representation is being
read? Is the intent that memmove or memcpy enjoy special atomicity
guarantees that cannot be duplicated by ordinary code?
Yes, in the case of determinate values with multiple representations we
do have a problem (and it is real because of cases such as normalisation
of pointers) Making change of representation atomic is not enough
because in the context of multiple threads we do not want one thread to
start reading/writing a pointer and another thread to change the
representation.
Clearly any change of representation is a 'write' operation but hidden
from the user.
No, that is not even remotely a reasonable interpretation of the text;
"the representation of a determinate pointer can change at any time".
"representation" is not a synonym for "value stored in an object".
Then what is it? Value is an interpretation of a stored bit-pattern.
The point is that some values have multiple equivalent representations.
In the case of indeterminate values, all bit patterns for the type are
equally valid.
Post by Kaz Kylheku
The intent cannot possibly be that read operations can secretly write back, or
that write operations can appear in the translation of code that isn't
even accessing objects at all, so that stored values spontaneously change.
This whole issue arose a decade ago when it was pointed out (to WG14)
that changes were detectable via memcopy and memcmp. The issue at the
time (and I was in the room when it was being discussed) centred on
whether such changes should be permitted in view of the fact that a
program could detect that they had been made. At that time concurrency
was not an issue. However stored values are moved be tween various forms
of memory and WG14 (C) decided that as long as values were maintained
representations did not have to be.

Now the issue has been revisited since by both C and C++ committees and
I have not kept track however the issue only affects a limited number of
implementations where multiple representations of determinate values are
possible. In the case of indeterminate values there is no reason to
require that specific bit-patterns be maintained even though such
changes can be detected via memcopy/memcmp.
Post by Kaz Kylheku
Even if that was the intent, the users of a compiler used for concurrent
programming would scream murder at the vendor and the problem would take care
of itself.
Well perhaps the vendors of such implementations already ensure that the
problem does not manifest by not taking advantage of the freedom allowed
by the Standard.
Post by Kaz Kylheku
Just because some English sentence in some defect report has an silly
interpretation doesn't mean that it's relevant in the real world, where
there are people to whom the silliness is obvious.
Whether an interpretation is silly or not can depend on how well the
issue is understood. I am always reluctant to describe something as
silly but that is just me.

Francis
e***@gmail.com
2015-04-02 00:19:17 UTC
Permalink
Raw Message
Post by Francis Glassborow
Post by Kaz Kylheku
Even if that was the intent, the users of a compiler used for concurrent
programming would scream murder at the vendor and the problem would take care
of itself.
Well perhaps the vendors of such implementations already ensure that the
problem does not manifest by not taking advantage of the freedom allowed
by the Standard.
The whole point of a standard is that programmers write to the standard, and compilers compile to the standard, and everybody is happy, without compiler writers having to check whether they broke some important code, and without programmers having to worry about their code is broken when compilers get more clever.
Kaz Kylheku
2015-04-02 02:26:47 UTC
Permalink
Raw Message
Post by e***@gmail.com
Post by Francis Glassborow
Post by Kaz Kylheku
Even if that was the intent, the users of a compiler used for concurrent
programming would scream murder at the vendor and the problem would take care
of itself.
Well perhaps the vendors of such implementations already ensure that the
problem does not manifest by not taking advantage of the freedom allowed
by the Standard.
The whole point of a standard is that programmers write to the standard, and
As far as programmers go, sincce most programmers do not read programming
language standards, the purpose served is that when programmers screw up
due to ignorance, we can use the standard to lay the blame on them, and
absolve the implementation.
Post by e***@gmail.com
compilers compile to the standard, and everybody is happy, without compiler
writers having to check whether they broke some important code, and without
Compiler writers absolutely must care about breaking some important code.

Compilers also provide additional requirements of their own beyond the
standard. These have to be preserved between compiler releases (or carefully
managed into obsolescence and removal).

Do you think that GCC can break __volatile __asm (...) tomorrow just because it
s not in ISO C?

"Hey, too bad; didn't you know that two underscores make
the behavior undefined?"

(The attitude in GCC development has been headed in that direction, though.)
Post by e***@gmail.com
programmers having to worry about their code is broken when compilers get
more clever.
You absolutely have to worry about code which stretches interpretations of the
standard. If you do weird things that nobody else does, even if they are
standard-conforming, and they get burned by a compiler, you might not be able
to get that issue resolved in any other way than by changing your
standard-conforming code.

It might get acknowledged, tagged with a low priority, and sit in a bug
database for the next five years; meanwhile, you have to ship something.

Programmers really should be quite conservative about what they rely on. I
mean, of course study standards, but not with the view of "what can I get
away with such that the implementation is still blamed if it goes wrong,
rather than me?"
Tim Rentsch
2015-04-04 13:11:57 UTC
Permalink
Raw Message
Post by Francis Glassborow
Post by Kaz Kylheku
Post by Francis Glassborow
Post by e***@gmail.com
I don't understand the intended reconciliation of the two
following rulings from DR 260
(1) the representation of a determinate pointer can change at any time, yet
(2) a determinate value can be "bitwise" (i.e., representationally)
copied using memmove or memcpy to make an equal value.
How is (2) possible given that memmove and memcopy are presumably
operating on the representation, which can change while that
representation is being read? Is the intent that memmove or memcpy
enjoy special atomicity guarantees that cannot be duplicated by
ordinary code?
Yes, in the case of determinate values with multiple representations
we do have a problem (and it is real because of cases such as
normalisation of pointers) Making change of representation atomic is
not enough because in the context of multiple threads we do not want
one thread to start reading/writing a pointer and another thread to
change the representation.
Clearly any change of representation is a 'write' operation but
hidden from the user.
No, that is not even remotely a reasonable interpretation of the
text; "the representation of a determinate pointer can change at
any time". [..snip..]
The intent cannot possibly be that read operations can secretly
write back, or that write operations can appear in the translation
of code that isn't even accessing objects at all, so that stored
values spontaneously change.
This whole issue arose a decade ago when it was pointed out (to
WG14) that changes were detectable via memcopy and memcmp. The
issue at the time (and I was in the room when it was being
discussed) centred on whether such changes should be permitted in
view of the fact that a program could detect that they had been
made. At that time concurrency was not an issue. However stored
values are moved be tween various forms of memory and WG14 (C)
decided that as long as values were maintained representations did
not have to be.
I understand that this is what they decided. The problem with
it is that it isn't consistent with how the rest of C operates,
since C allow programs to operate on representations directly
(ie, by accessing via unsigned char *). It would be consistent
to allow either one or the other, but not both.
Post by Francis Glassborow
Now the issue has been revisited since by both C and C++
committees and I have not kept track however the issue only
affects a limited number of implementations where multiple
representations of determinate values are possible. In the case
of indeterminate values there is no reason to require that
specific bit-patterns be maintained even though such changes can
be detected via memcopy/memcmp.
Of course there is, the same reason to have an abstract semantics
in the first place: to make it feasible to reason about program
behavior without being forced to know what's going on in the
underlying hardware and/or OS. If we get to the point where, for
example, the only way to be sure what a program is doing is by
looking at the generated assembly language, then the abstract
semantics is a failure. I draw the opposite conclusion: there
is no reason to /allow/ objects (even if specified as having
indeterminate values) to spontaneously change their bit-state.
What arguments I have heard for the opposing viewpoint basically
all boil down to implementor laziness. Furthermore there is now
another concern, perhaps even more compelling in light of news
items over the last few years: the more often program behavior
becomes unpredictable the more likely it is that a program will
have a security hole. It's only a matter of time before overly
aggressive "optimization" rulings (of which the issue here is one
example) introduce a vulnerabilty into otherwise safe and
acceptable programs. This is far too high a price to pay for
some imagined benefit of cutting these kinds of implementational
corners.
e***@gmail.com
2015-04-02 00:06:20 UTC
Permalink
Raw Message
Post by Kaz Kylheku
The intent cannot possibly be that read operations can secretly write back, or
that write operations can appear in the translation of code that isn't
even accessing objects at all, so that stored values spontaneously change.
Well, if you didn't read the object at all, the change to its representation wouldn't be architecturally visible, But as to the other thing, that is precisely what the standard is supposed to allow. For example, there are real GCC examples where printing the same valid (but indeterminate) int variable twice in a row prints two different values.

The whole point of the standard is to set a boundary for how far compilers can go, and to give developers a well-defined boundary where they can write code that will be safe from the even crazier compilers of tomorrow.
Kaz Kylheku
2015-04-02 02:15:10 UTC
Permalink
Raw Message
Post by e***@gmail.com
Post by Kaz Kylheku
The intent cannot possibly be that read operations can secretly write back, or
that write operations can appear in the translation of code that isn't
even accessing objects at all, so that stored values spontaneously change.
Well, if you didn't read the object at all, the change to its representation
wouldn't be architecturally visible, But as to the other thing, that is
precisely what the standard is supposed to allow. For example, there are real
GCC examples where printing the same valid (but indeterminate) int variable
twice in a row prints two different values.
Indeterminate values can change (in the sense of the stored value mutating),
because the underlying storage of an indeterminate variable doesn't belong to
that variable.

A variable is live from the moment it is assigned for the first time (or
initialized) to the point(s) in the program graph past it has no next use.

If uses occur prior to that first assignment, those uses are outside of the
live region in which the variable has a stable, stored value. The
behavior is not defined, so anything can happen! Quite a possible actual
behavior is that some storage is accessed, but the resulting value is not
stable between multiple such accesses. For instance, when the variable goes
live for the first time at the point of its first assignment in a given
control flow, it might be in register R1. The premature accesses prior
to that point could be compiled such that they also sample register R1.
Now, R1 isn't sitting there unused in that region of code; it is in used for
some other purposes, and changing!

This kind of thing cannot happen to a determinate variable.

The word "change" is loaded; for instance this is a change:

int x = 1;
double y = x; /* int value changes to double */

When we access a char object, it gets promoted, thereby changing to int
(or, rarely, unsigned int). This is a possible representation change; there
are more bits in the new representation.

Change can be nondestructive; changes occur in functional operations.
2 + 3 changes the operands 2 and 3 into a 5.

"change of representation" does not imply weird effects whereby reading
a location causes it to update. For such a weird idea to be the intent, it
had better be spelled out in the wording, like "representation change includes
the possibility that the most recently stored value in an accessed object is
modified to an alternative representation".

Here is a useful quote from C99, 6.2.4:

"The lifetime of an object is the portion of program execution during which
storage is guaranteed to be reserved for it. An object exists, has a constant
address,(footnote 25) and retains its last-stored value throughout its
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
lifetime.26) If an object is referred to outside of its lifetime, the
behavior is undefined. The value of a pointer becomes indeterminate when the
object it points to reaches the end of its lifetime. (footnote 26)
----
(25: The term ‘‘constant address’’ means that two pointers to the object
constructed at possibly different times will compare equal. The address may
be different during two different executions of the same program.)

(26: In the case of a volatile object, the last store need not be explicit in
the program.)


"constant address" rules out all that stuff elsethread about a C
implementation with a copying or compacting garbage collector.

If we sample the address of an object, and then later sample it again,
the pointers have to compare equal. That's even if we "fwrite" the old
pointer to a file and "fread" it back again.

A conforming C implementation cannot have a garbage collector that changes the
addresses of objects, period.
Post by e***@gmail.com
The whole point of the standard is to set a boundary for how far compilers
can go, and to give developers a well-defined boundary where they can write
code that will be safe from the even crazier compilers of tomorrow.
Thanks for the enlightening platitude.
Francis Glassborow
2015-04-02 12:01:32 UTC
Permalink
Raw Message
Post by Kaz Kylheku
Indeterminate values can change (in the sense of the stored value mutating),
because the underlying storage of an indeterminate variable doesn't belong to
that variable.
How do you arrive at that conclusion?

Value (stored or otherwise) is NOT a synonym for representation. An
indeterminate value may be represented by any valid bit-pattern for the
type concerned. Reading an indeterminate value need not produce
undefined behaviour.

Consider:

void foo(){
unsigned char c; // not innitialised so indeterminate
printf("%c", c); // OK
printf("%c", c); // OK, but might be different to first use
int * ip; // not initialised
printf("%p", ip); // UB ip has not been initialised
{
int i;
printf("%i", i); // UB
ip = &i;
printf("%p", ip); // OK
}
printf("%p", ip); // ???
}

all bit patterns for unsigned char are valid so failure to initialise
simply makes the value indeterminate and so the bit-pattern is subject
to spontaneous mutation

We cannot make the same guarantee for int because there is a possibility
of a trap value (otherwise there would not be a problem.
Now what about the last printf above? ip has become indeterminate but is
accessing the pointer creating UB (certainly dereferencing it is)?

The implementation would need to go through contortions to create a
problem with that last line.
Post by Kaz Kylheku
A variable is live from the moment it is assigned for the first time (or
initialized) to the point(s) in the program graph past it has no next use.
The lifetime of a variable is from the point of definition to the end of
the scope in which it is declared. Note that definition does not require
initialisation. Technically the storage is allocated at the point of
definition though some implementations choose to delay allocation of
very large objects until the point of intitalisation (and that has
caused problems when storage has been exhausted between the point of
definition and the point of initialisation. Rare but it can happen.
Post by Kaz Kylheku
If uses occur prior to that first assignment, those uses are outside of the
live region in which the variable has a stable, stored value. The
behavior is not defined, so anything can happen! Quite a possible actual
behavior is that some storage is accessed, but the resulting value is not
stable between multiple such accesses. For instance, when the variable goes
live for the first time at the point of its first assignment in a given
control flow, it might be in register R1. The premature accesses prior
to that point could be compiled such that they also sample register R1.
Now, R1 isn't sitting there unused in that region of code; it is in used for
some other purposes, and changing!
Just handwaving. Quote the standard please.
Post by Kaz Kylheku
This kind of thing cannot happen to a determinate variable.
int x = 1;
double y = x; /* int value changes to double */
When we access a char object, it gets promoted, thereby changing to int
(or, rarely, unsigned int). This is a possible representation change; there
are more bits in the new representation.
Change can be nondestructive; changes occur in functional operations.
2 + 3 changes the operands 2 and 3 into a 5.
"change of representation" does not imply weird effects whereby reading
a location causes it to update. For such a weird idea to be the intent, it
had better be spelled out in the wording, like "representation change includes
the possibility that the most recently stored value in an accessed object is
modified to an alternative representation".
"The lifetime of an object is the portion of program execution during which
storage is guaranteed to be reserved for it. An object exists, has a constant
address,(footnote 25) and retains its last-stored value throughout its
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
NOTE: 'value' NOT 'representation' The value must not change but that
says exactly nothing about the representation. And the existence of
normalisation (particularly of pointers on some platforms) make that
distinction important.
Post by Kaz Kylheku
lifetime.26) If an object is referred to outside of its lifetime, the
behavior is undefined. The value of a pointer becomes indeterminate when the
object it points to reaches the end of its lifetime. (footnote 26)
But to understand that you need to understand 'lifetime' as specified by
the standard.
Post by Kaz Kylheku
----
(25: The term ‘‘constant address’’ means that two pointers to the object
constructed at possibly different times will compare equal. The address may
be different during two different executions of the same program.)
Yes that is a requirement, but again says nothing about representation.
On systems where there are multiple possible representations of an
address (e.g. many MSDOS based platforms in the 80's) that simply draws
attention to the fact that different representations must compare equal
even if the actual representations are different (as could be determined
by using memcmp -- which compares representations not values)
Post by Kaz Kylheku
(26: In the case of a volatile object, the last store need not be explicit in
the program.)
"constant address" rules out all that stuff elsethread about a C
implementation with a copying or compacting garbage collector.
Well it certainly makes it immensely more difficult.
Post by Kaz Kylheku
If we sample the address of an object, and then later sample it again,
the pointers have to compare equal. That's even if we "fwrite" the old
pointer to a file and "fread" it back again.
Yes, but that does not mean that the representations will be identical.
Post by Kaz Kylheku
A conforming C implementation cannot have a garbage collector that changes the
addresses of objects, period.
And that actually makes C an inappropriate choice of language for some
problem domains. C++ with its destructors has less of a problem because
much of its management of dynamic resources can be automated.
Post by Kaz Kylheku
Post by e***@gmail.com
The whole point of the standard is to set a boundary for how far compilers
can go, and to give developers a well-defined boundary where they can write
code that will be safe from the even crazier compilers of tomorrow.
Thanks for the enlightening platitude.
However that means that the programmer has to actually understand
exactly what the standard means. Fortunately we can usually get away
with not actually being standards experts otherwise we would have to go
through contortions everytime we carry out integer arithmetic to ensure
no overflow or underflow (strictly undefined behaviour, but few cares
and it rarely matters)

Francis
Keith Thompson
2015-04-02 15:51:27 UTC
Permalink
Raw Message
Post by Francis Glassborow
Post by Kaz Kylheku
Indeterminate values can change (in the sense of the stored value
mutating), because the underlying storage of an indeterminate
variable doesn't belong to that variable.
How do you arrive at that conclusion?
Value (stored or otherwise) is NOT a synonym for representation. An
indeterminate value may be represented by any valid bit-pattern for
the type concerned. Reading an indeterminate value need not produce
undefined behaviour.
void foo(){
unsigned char c; // not innitialised so indeterminate
printf("%c", c); // OK
printf("%c", c); // OK, but might be different to first use
int * ip; // not initialised
printf("%p", ip); // UB ip has not been initialised
{
int i;
printf("%i", i); // UB
ip = &i;
printf("%p", ip); // OK
}
printf("%p", ip); // ???
}
all bit patterns for unsigned char are valid so failure to initialise
simply makes the value indeterminate and so the bit-pattern is subject
to spontaneous mutation
We cannot make the same guarantee for int because there is a
possibility of a trap value (otherwise there would not be a problem.
Does that mean that reading an uninitialized int object is *not*
undefined behavior if the implementation has no trap representations?

{
int i;
if (sizeof (int) * CHAR_BIT == 32 &&
INT_MIN == -2147483648 &&
INT_MAX == +2147483647)
{
printf("%d\n", i); // ???
}
}
Post by Francis Glassborow
Now what about the last printf above? ip has become indeterminate but
is accessing the pointer creating UB (certainly dereferencing it is)?
Yes.
Post by Francis Glassborow
The implementation would need to go through contortions to create a
problem with that last line.
Sure, but such contortions are not forbidden by the standard. The
pointer's value is indeterminate at that point. And if the compiler is
able to detect the undefined behavior at compile time, it can generate
any code it likes.
Post by Francis Glassborow
Post by Kaz Kylheku
A variable is live from the moment it is assigned for the first time (or
initialized) to the point(s) in the program graph past it has no next use.
The lifetime of a variable is from the point of definition to the end
of the scope in which it is declared. Note that definition does not
require initialisation. Technically the storage is allocated at the
point of definition though some implementations choose to delay
allocation of very large objects until the point of intitalisation
(and that has caused problems when storage has been exhausted between
the point of definition and the point of initialisation. Rare but it
can happen.
I think he's using the word "live" in a sense that doesn't refer to the
object's lifetime as defined by the standard. Roughly, an object is
"live" when it has a determinate value.

Also, the lifetime of an automatic object begins on entry to the block
in which it's defined, not at the point of definition -- unless it's a
VLA (see N1570 6.2.4p6). Its *scope* begins at the point of definition.
There are contrived ways involving gotos to access an automatic object
outside its scope but within its lifetime.

[...]
Post by Francis Glassborow
Post by Kaz Kylheku
"The lifetime of an object is the portion of program execution during which
storage is guaranteed to be reserved for it. An object exists, has a constant
address,(footnote 25) and retains its last-stored value throughout its
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
NOTE: 'value' NOT 'representation' The value must not change but that
says exactly nothing about the representation. And the existence of
normalisation (particularly of pointers on some platforms) make that
distinction important.
Right, so if a type has multiple representations (bit patterns) for the
same value, a non-volatile object's representation can change
spontaneously but its value cannot.

I had tended to think that a (non-bitfield) object's representation
consists of the values of an array of unsigned char overlaid on the
object, and that the elements of that array are themselves objects whose
values cannot change spontaneously. But reading the standard more
carefully (N1570 6.2.6.1p4):

Values stored in non-bit-field objects of any other object type
consist of n x CHAR_BIT bits, where n is the size of an object
of that type, in bytes. The value may be copied into an object
of type unsigned char [n] (e.g., by memcpy); the resulting set
of bytes is called the object representation of the value.

To obtain the representation of an object, you have to *copy* its bytes
into an array of unsigned char; you can't (necessarily) just examine
them in place. If I copy a pointer object's representation into an
array of unsigned char, and later copy its representation into another
array of unsigned char, the standard doesn't require the two arrays to
contain the same values; they just both have to be representations of
the same pointer *value*.

{
int i = 42;
int *p = &i;
char rep0[sizeof p];
char rep1[sizeof p];
memcpy(rep0, p, sizeof p);
memcpy(rep1, p, sizeof p);
// rep0 and rep1 needn't have the same contents
}

Quoting the commmittee response from DR 260:

Values may have any bit-pattern that validly represents them
and the implementation is free to move between alternate
representations (for example, it may normalize pointers,
floating-point representations etc.). In the case of an
indeterminate value all bit-patterns are valid representations
and the actual bit-pattern may change without direct action of
the program.

But I still see some problems. It doesn't say *when* an object's
representation may or may not change, and copying that representation
into an array is not an atomic operation.

Assuming a determinate value, I can obtain a pointer object's
representation by memcpy()ing it into an array of unsigned char.
Does that imply that the representation can't change in the middle
of the memcpy()? If it could, then I might get part of the old
representation combined with part of the new representation,
yielding something that may be invalid. What if I copy bytes one
at a time in arbitrary order?

Practically speaking, are there implementations that perform the
kind of normalization that this DR is meant to allow? Could the
requirements in the standard could be tightened up a bit without
breaking such implementations?
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Kaz Kylheku
2015-04-02 18:51:50 UTC
Permalink
Raw Message
Post by Keith Thompson
Post by Francis Glassborow
The lifetime of a variable is from the point of definition to the end
of the scope in which it is declared. Note that definition does not
require initialisation. Technically the storage is allocated at the
point of definition though some implementations choose to delay
allocation of very large objects until the point of intitalisation
(and that has caused problems when storage has been exhausted between
the point of definition and the point of initialisation. Rare but it
can happen.
I think he's using the word "live" in a sense that doesn't refer to the
object's lifetime as defined by the standard. Roughly, an object is
"live" when it has a determinate value.
I was referring to the concept of variable liveness, not lifetime.

int a;
// a is not live yet

a = 3; // a is live with value 3

b = a + 1;

external_func(a); // last use of a with value 3.

// a is not live

a = 4; // a is live again

The concept in C of a being accessed while indeterminate, due to
not being initalized, maps to the concept of a being used in
an execution path where it has not yet become live.

Liveness is useful internally in an implementation because where
the object is not live, its resources can be reclaimed.
For instance in the "a is not live" region, a does not have to
have any assigned register or storage.
Post by Keith Thompson
Also, the lifetime of an automatic object begins on entry to the block > in which it's defined, not at the point of definition -- unless it's a
VLA (see N1570 6.2.4p6). Its *scope* begins at the point of definition.
There are contrived ways involving gotos to access an automatic object
outside its scope but within its lifetime.
This disinction is obviously necessary because lifetime is a dynamic (run-time)
concept, whereas "point of declaration" is static.
Post by Keith Thompson
I had tended to think that a (non-bitfield) object's representation
consists of the values of an array of unsigned char overlaid on the
object, and that the elements of that array are themselves objects whose
values cannot change spontaneously. But reading the standard more
Values stored in non-bit-field objects of any other object type
consist of n x CHAR_BIT bits, where n is the size of an object
of that type, in bytes. The value may be copied into an object
of type unsigned char [n] (e.g., by memcpy); the resulting set
of bytes is called the object representation of the value.
Regardless, it is undeniable that bytes are objects and have value.

If an object is accessed that way, it is a value access.

If the stored, in-memory representation changes, those bytes have
changed value.
Post by Keith Thompson
To obtain the representation of an object, you have to *copy* its bytes
into an array of unsigned char; you can't (necessarily) just examine
them in place. If I copy a pointer object's representation into an
array of unsigned char, and later copy its representation into another
array of unsigned char, the standard doesn't require the two arrays to
contain the same values; they just both have to be representations of
the same pointer *value*.
Hence, the bytes of the object failed to retain their last stored value.

Open and shut case of nonsensical interpretation.

Regions of memory magically flipping their bits in such a way that they
nevertheless preserve a a value (such as the abstract value of a floating-point
representation) is just fairies-and-unicorns, bat-shit crazy.
Francis Glassborow
2015-04-02 19:28:06 UTC
Permalink
Raw Message
Post by Kaz Kylheku
Post by Keith Thompson
To obtain the representation of an object, you have to *copy* its bytes
into an array of unsigned char; you can't (necessarily) just examine
them in place. If I copy a pointer object's representation into an
array of unsigned char, and later copy its representation into another
array of unsigned char, the standard doesn't require the two arrays to
contain the same values; they just both have to be representations of
the same pointer *value*.
Hence, the bytes of the object failed to retain their last stored value.
Open and shut case of nonsensical interpretation.
Regions of memory magically flipping their bits in such a way that they
nevertheless preserve a a value (such as the abstract value of a floating-point
representation) is just fairies-and-unicorns, bat-shit crazy.
You are entitled to your opinion but that was not, IIRC, shared by WG14
at the time when I was an active member.
And on the issue you raised earlier about compacting garbage collectors
and C. Consider an implementation that uses a 'fat' pointer where the
first part contains the last value assigned to the pointer, and the
second part contains the current address of the object being referenced.
The first part is used for pointer comparisons, the second part is used
for dereferencing.

Francis
Kaz Kylheku
2015-04-02 21:44:22 UTC
Permalink
Raw Message
Post by Francis Glassborow
Post by Kaz Kylheku
Post by Keith Thompson
To obtain the representation of an object, you have to *copy* its bytes
into an array of unsigned char; you can't (necessarily) just examine
them in place. If I copy a pointer object's representation into an
array of unsigned char, and later copy its representation into another
array of unsigned char, the standard doesn't require the two arrays to
contain the same values; they just both have to be representations of
the same pointer *value*.
Hence, the bytes of the object failed to retain their last stored value.
Open and shut case of nonsensical interpretation.
Regions of memory magically flipping their bits in such a way that they
nevertheless preserve a a value (such as the abstract value of a floating-point
representation) is just fairies-and-unicorns, bat-shit crazy.
You are entitled to your opinion but that was not, IIRC, shared by WG14
at the time when I was an active member.
Sure; I concede that someone who builds a machine in which storage bits flip in
that surprising way, and tries to convince the marketplace that their C
compiler is ISO C conforming, can present a plausible case by referring to
these opinions and discussions.

I don't think that the technology will fly even if the arguments are accepted
by everyone, so they are moot. "You are perfectly right, but I'm going with
someone else's hardware and compiler, thank you very much".
Keith Thompson
2015-04-02 22:06:19 UTC
Permalink
Raw Message
Post by Kaz Kylheku
Post by Keith Thompson
I had tended to think that a (non-bitfield) object's representation
consists of the values of an array of unsigned char overlaid on the
object, and that the elements of that array are themselves objects whose
values cannot change spontaneously. But reading the standard more
Values stored in non-bit-field objects of any other object type
consist of n x CHAR_BIT bits, where n is the size of an object
of that type, in bytes. The value may be copied into an object
of type unsigned char [n] (e.g., by memcpy); the resulting set
of bytes is called the object representation of the value.
Regardless, it is undeniable that bytes are objects and have value.
If an object is accessed that way, it is a value access.
If the stored, in-memory representation changes, those bytes have
changed value.
I'm going back and forth in my mind on whether the bytes that make up
an object (say, a pointer object) are "objects" in their own right
and fall under the requirement that they retain their last stored
value. The standard does seem to go out of its way to say that you
need to *copy* those bytes to an array to obtain the containing
object's representation, rather than examining them in place.
On the other hand, the fact that you can copy them in the first
place does seem to imply that they're objects. And the definition of
"object" as a "region of data storage in the execution environment,
the contents of which can represent values" seems to reinforce that.
Post by Kaz Kylheku
Post by Keith Thompson
To obtain the representation of an object, you have to *copy* its bytes
into an array of unsigned char; you can't (necessarily) just examine
them in place. If I copy a pointer object's representation into an
array of unsigned char, and later copy its representation into another
array of unsigned char, the standard doesn't require the two arrays to
contain the same values; they just both have to be representations of
the same pointer *value*.
Hence, the bytes of the object failed to retain their last stored value.
Open and shut case of nonsensical interpretation.
[...]

But the committee certainly *wants* to permit normalization of a
floating-point or pointer object's representation, as long as the
normalized representation represents the same *value* it had before.
And that normalization (which clearly changes the bytes making up the
representation) can happen at arbitrary types, though that hasn't been
clearly addressed as far as I can tell. (You could think of the bytes
as being *volatile* unsigned chars, but that's a rationalization not
supported by the standard.)

I see two possibilities (and I'd be satisified with either):

1. The bytes making up the representation of an object are objects in
their own right, and are subject to the requirement that an object
retains its last stored value. This would ban implicit normalization
except when the object is explicitly updated.

2. An object's representation may change at more or less arbitrary
times, even when the object has been initialized to a determinate value,
is not volatile, and has not been explicitly updated. The new
(normalized) representation must represent the same value as the old
(non-normalized) representation.

The committee, in its response to DR 260, has clearly chosen the second
possibility (and you clearly think they're wrong) -- but that still open
the questions I raised elsethread:

If you obtain the object's representation by copying it into an array of
unsigned char, how do you avoid getting pieces of the unnormalized and
normalized representations, if the representation changes in the middle
of the copy? At what times is normalization permitted or not permitted
to occur?

If #2 is the accepted interpretation, I suggest that the standard needs
to be updated to clarify this.

As a practical matter, is this flexibility actually useful? Do any
current or anticipated real-world systems actually normalize objects
like this? If the committee went with interpretation #1 (which is
simpler and arguably more conforming to the wording of the standard),
what harm would it cause?
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Wojtek Lerch
2015-04-04 03:40:19 UTC
Permalink
Raw Message
On 02/04/2015 6:06 PM, Keith Thompson wrote:
[...]
Post by Keith Thompson
I'm going back and forth in my mind on whether the bytes that make up
an object (say, a pointer object) are "objects" in their own right
and fall under the requirement that they retain their last stored
value. The standard does seem to go out of its way to say that you
need to *copy* those bytes to an array to obtain the containing
object's representation, rather than examining them in place.
On the other hand, the fact that you can copy them in the first
place does seem to imply that they're objects. And the definition of
"object" as a "region of data storage in the execution environment,
the contents of which can represent values" seems to reinforce that.
They're objects, but maybe storing a value in a big object does not
count as also storing a bunch of *values* in the little one-byte
objects, and that's why they don't need to preserve their, um,
last-overwritten values? Personally I don't think the standard does a
very good job of explaining what happens to the value of an object when
you store a new value in a different but overlapping object, but it's
pretty obvious that it is not preserved. (But does it unconditionally
become indeterminate? Does it depend on how the two objects overlap and
what their types are? Does it have something to do with effective types?)
Post by Keith Thompson
[...]
If you obtain the object's representation by copying it into an array of
unsigned char, how do you avoid getting pieces of the unnormalized and
normalized representations, if the representation changes in the middle
of the copy? At what times is normalization permitted or not permitted
to occur?
If an implementation allows the "normalization" to happen when my
program is half way through copying the bytes to an array, the final
content of my array is still, by definition, a representation. It's the
implementation's problem how to make it satisfy all the guarantees that
the standard makes about a representation.

...
Post by Keith Thompson
As a practical matter, is this flexibility actually useful? Do any
current or anticipated real-world systems actually normalize objects
like this? If the committee went with interpretation #1 (which is
simpler and arguably more conforming to the wording of the standard),
what harm would it cause?
The original question in DR260 was only about indeterminate values. A
requirement that reading an indeterminate value (or its bytes) multiple
times must produce the same result could prevent certain useful
optimizations -- either by the compiler (which can use the indeterminate
object for temporary storage) or by an operating system (if a page of
memory has never been written to, its contents don't need to be
preserved by the virtual memory subsystem).
Tim Rentsch
2015-04-04 14:47:17 UTC
Permalink
Raw Message
Post by Keith Thompson
Post by Kaz Kylheku
Post by Keith Thompson
I had tended to think that a (non-bitfield) object's representation
consists of the values of an array of unsigned char overlaid on the
object, and that the elements of that array are themselves objects
whose values cannot change spontaneously. But reading the standard
Values stored in non-bit-field objects of any other object type
consist of n x CHAR_BIT bits, where n is the size of an object
of that type, in bytes. The value may be copied into an object
of type unsigned char [n] (e.g., by memcpy); the resulting set
of bytes is called the object representation of the value.
Regardless, it is undeniable that bytes are objects and have value.
If an object is accessed that way, it is a value access.
If the stored, in-memory representation changes, those bytes have
changed value.
I'm going back and forth in my mind on whether the bytes that make
up an object (say, a pointer object) are "objects" in their own
right and fall under the requirement that they retain their last
stored value. The standard does seem to go out of its way to say
that you need to *copy* those bytes to an array to obtain the
containing object's representation, rather than examining them in
place. On the other hand, the fact that you can copy them in the
first place does seem to imply that they're objects. And the
definition of "object" as a "region of data storage in the execution
environment, the contents of which can represent values" seems to
reinforce that. [...snip...]
1. The bytes making up the representation of an object are objects
in their own right, and are subject to the requirement that an
object retains its last stored value. This would ban implicit
normalization except when the object is explicitly updated.
2. An object's representation may change at more or less arbitrary
times, even when the object has been initialized to a determinate
value, is not volatile, and has not been explicitly updated. The
new (normalized) representation must represent the same value as
the old (non-normalized) representation.
I am strongly of the opinion that only (1) is consistent with
what C allows by way of access through, eg, unsigned char *.
Post by Keith Thompson
The committee, in its response to DR 260, has clearly chosen the
second possibility (and you clearly think they're wrong) -- but
If you obtain the object's representation by copying it into an
array of unsigned char, how do you avoid getting pieces of the
unnormalized and normalized representations, if the representation
changes in the middle of the copy?
At the risk of stating the obvious, it's not possible.
Post by Keith Thompson
At what times is normalization permitted or not permitted to
occur?
My reading of the response in DR 260 is that such changes are
meant to be allowed at any time. Certainly if they are allowed
we should expect that they are allowed at any time unless there
is some sort of proviso saying when they are allowed or when not.
Post by Keith Thompson
If #2 is the accepted interpretation, I suggest that the standard
needs to be updated to clarify this.
The idea of objects spontaneously changing their bit-state is
fundamentally at odds with how the C language is viewed and
used, at a very primal level. What is more, IMO this property
of how C works (ie, without spontaneous bit-change) is one of
C's most important strengths. There is no way to incorporate
property (2) and have C still be C.
Post by Keith Thompson
As a practical matter, is this flexibility actually useful? Do
any current or anticipated real-world systems actually normalize
objects like this?
Based on other comments I believe there are cases where differences
have been (or could be?) observed, ie, comparing a scenario where
spontaneous bit-change is allowed to one where it isn't. But
saying there are cases where changes /can/ occur is much weaker
than saying there are cases where changes /need/ to occur; AFAIK
the cases reported fall into the former category, not the latter.
Post by Keith Thompson
If the committee went with interpretation #1 (which is simpler and
arguably more conforming to the wording of the standard), what
harm would it cause?
That is an important question, but I think the flip side is even
more important: under the assumption of (2), what is the cost to
programing effort, program reliability and program security? The
arguments I have heard are all about "optimization" (keeping in
mind that the changes being considered here are not optimizations
in the normal sense but a change to program semantics), but I
haven't seen any attempts made to quantify the benefits of such
"optimizations", or to measure (or even consider) the cost of
accommodating such a change. In effect, any "optimization" gain
(even an imagined one) is given infinite weight, and the cost on
the other side is given a weight of zero. I submit that it
should be the other way around: the burden of proof should be
solely on those who propose such "optimizations" to show both
that (a) there is a significant performance gain that will (and
not just might) be achieved, and (b) that all the consequent
costs on the programming side be acceptably small. If the
proposers cannot demonstrate both of these conditions to the
satisfaction of program developers, then their suggestion should
be firmly and unequivocally rejected. In short: implementors
serve developers, not the other way around.
Tim Rentsch
2015-04-04 13:44:31 UTC
Permalink
Raw Message
Post by Keith Thompson
Post by Francis Glassborow
Post by Kaz Kylheku
Indeterminate values can change (in the sense of the stored value
mutating), because the underlying storage of an indeterminate
variable doesn't belong to that variable.
How do you arrive at that conclusion?
Value (stored or otherwise) is NOT a synonym for representation.
An indeterminate value may be represented by any valid bit-pattern
for the type concerned. Reading an indeterminate value need not
produce undefined behaviour.
void foo(){
unsigned char c; // not innitialised so indeterminate
printf("%c", c); // OK
printf("%c", c); // OK, but might be different to first use
int * ip; // not initialised
printf("%p", ip); // UB ip has not been initialised
{
int i;
printf("%i", i); // UB
ip = &i;
printf("%p", ip); // OK
}
printf("%p", ip); // ???
}
all bit patterns for unsigned char are valid so failure to
initialise simply makes the value indeterminate and so the
bit-pattern is subject to spontaneous mutation
We cannot make the same guarantee for int because there is a
possibility of a trap value (otherwise there would not be a
problem.
Does that mean that reading an uninitialized int object is
*not* undefined behavior if the implementation has no trap
representations?
{
int i;
if (sizeof (int) * CHAR_BIT == 32 &&
INT_MIN == -2147483648 &&
INT_MAX == +2147483647)
{
printf("%d\n", i); // ???
}
}
In C 2011 both the printf() of 'c' and the printf() of 'i' are
explicitly undefined behavior, regardless of whether trap
representations are involved. Quoting 6.3.2.1 p2, note the
last sentence:

Except [...], an lvalue that does not have array type is
converted to the value stored in the designated object (and
is no longer an lvalue); [...] If the lvalue designates
an object of automatic storage duration that could have been
declared with the register storage class (never had its
address taken), and that object is uninitialized (not
declared with an initializer and no assignment to it has
been performed prior to use), the behavior is undefined.

This proviso illustrates the right approach in this area.
Because the address of the object is not available, there is no
way to inspect its representation, and so it doesn't matter if
the representation changes. Moreover such an object can be
accessed only in a limited area of program text, making it
feasible to do static analysis to detect (conservatively) whether
it may be accessed before being initialized.
Jakob Bohm
2015-04-01 16:09:02 UTC
Permalink
Raw Message
Post by e***@gmail.com
(1) the representation of a determinate pointer can change at any time, yet
(2) a determinate value can be "bitwise" (i.e., representationally) copied using memmove or memcpy to make an equal value.
How is (2) possible given that memmove and memcopy are presumably operating on the representation, which can change while that representation is being read? Is the intent that memmove or memcpy enjoy special atomicity guarantees that cannot be duplicated by ordinary code?
How about this interpretation of the interpretation:

A garbage-collecting implementation (or any other implementation
with similar characteristics) may at any time "virtually freeze" the
program, locate all live pointers, free memory no longer referenced
and compact the memory layout of any remaining memory, thus changing
the values of many pointers. Such an implementation is required to
somehow identify "safe points" where the threads can be frozen, such
that no pointer value is half written or half read. This could, for
example, be done by freezing threads on entry/exit from standard
library functions.

Another usage would be an MMU-less implementation of POSIX, where
fork(), signal() and other process-related operations may need to
move the logical process address space to a different actual address
by locating and rewriting all pointer values inside this address
space. Here, the operation would be done when all threads in the
process are at cancel points (entry/exit from most but not all
system calls).

However implementations are not limited to using library function
calls as safe points, as the "compiler" part of the implementation
may place hidden markers in the code to indicate other safe points,
as well as information about what registers hold pointers at such
points.

An infamous example of such an implementation was "real mode Windows
3.0" and older, where all machine code functions were required to
use very specific forms of the stack frame, and the OS was permitted
to walk these stack frames as it rearranged memory blocks within
the low 640K RAM on an 8088 based PC, sometimes even swapping out
memory blocks to disk.


Enjoy

Jakob
--
Jakob Bohm, CIO, Partner, WiseMo A/S. https://www.wisemo.com
Transformervej 29, 2860 Søborg, Denmark. Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded
Francis Glassborow
2015-04-01 16:30:49 UTC
Permalink
Raw Message
Post by Jakob Bohm
Post by e***@gmail.com
I don't understand the intended reconciliation of the two following
rulings from DR 260
(1) the representation of a determinate pointer can change at any time, yet
(2) a determinate value can be "bitwise" (i.e., representationally)
copied using memmove or memcpy to make an equal value.
How is (2) possible given that memmove and memcopy are presumably
operating on the representation, which can change while that
representation is being read? Is the intent that memmove or memcpy
enjoy special atomicity guarantees that cannot be duplicated by
ordinary code?
A garbage-collecting implementation (or any other implementation
with similar characteristics) may at any time "virtually freeze" the
program, locate all live pointers, free memory no longer referenced
and compact the memory layout of any remaining memory, thus changing
the values of many pointers. Such an implementation is required to
somehow identify "safe points" where the threads can be frozen, such
that no pointer value is half written or half read. This could, for
example, be done by freezing threads on entry/exit from standard
library functions.
Another usage would be an MMU-less implementation of POSIX, where
fork(), signal() and other process-related operations may need to
move the logical process address space to a different actual address
by locating and rewriting all pointer values inside this address
space. Here, the operation would be done when all threads in the
process are at cancel points (entry/exit from most but not all
system calls).
However implementations are not limited to using library function
calls as safe points, as the "compiler" part of the implementation
may place hidden markers in the code to indicate other safe points,
as well as information about what registers hold pointers at such
points.
An infamous example of such an implementation was "real mode Windows
3.0" and older, where all machine code functions were required to
use very specific forms of the stack frame, and the OS was permitted
to walk these stack frames as it rearranged memory blocks within
the low 640K RAM on an 8088 based PC, sometimes even swapping out
memory blocks to disk.
Enjoy
Jakob
Well that is even more extreme because the value of the pointer (in the
limited sense of being an address in storage, rather than as a locator
for a specific object) is being changed as well as the bit pattern.

The problem is that C++ (along with C) has mechanisms to directly access
stored bit-patterns.

Francis
Jakob Bohm
2015-04-01 16:46:05 UTC
Permalink
Raw Message
Post by Francis Glassborow
Post by Jakob Bohm
Post by e***@gmail.com
I don't understand the intended reconciliation of the two following
rulings from DR 260
(1) the representation of a determinate pointer can change at any time, yet
(2) a determinate value can be "bitwise" (i.e., representationally)
copied using memmove or memcpy to make an equal value.
How is (2) possible given that memmove and memcopy are presumably
operating on the representation, which can change while that
representation is being read? Is the intent that memmove or memcpy
enjoy special atomicity guarantees that cannot be duplicated by
ordinary code?
A garbage-collecting implementation (or any other implementation
with similar characteristics) may at any time "virtually freeze" the
program, locate all live pointers, free memory no longer referenced
and compact the memory layout of any remaining memory, thus changing
the values of many pointers. Such an implementation is required to
somehow identify "safe points" where the threads can be frozen, such
that no pointer value is half written or half read. This could, for
example, be done by freezing threads on entry/exit from standard
library functions.
Another usage would be an MMU-less implementation of POSIX, where
fork(), signal() and other process-related operations may need to
move the logical process address space to a different actual address
by locating and rewriting all pointer values inside this address
space. Here, the operation would be done when all threads in the
process are at cancel points (entry/exit from most but not all
system calls).
However implementations are not limited to using library function
calls as safe points, as the "compiler" part of the implementation
may place hidden markers in the code to indicate other safe points,
as well as information about what registers hold pointers at such
points.
An infamous example of such an implementation was "real mode Windows
3.0" and older, where all machine code functions were required to
use very specific forms of the stack frame, and the OS was permitted
to walk these stack frames as it rearranged memory blocks within
the low 640K RAM on an 8088 based PC, sometimes even swapping out
memory blocks to disk.
Enjoy
Jakob
Well that is even more extreme because the value of the pointer (in the
limited sense of being an address in storage, rather than as a locator
for a specific object) is being changed as well as the bit pattern.
The problem is that C++ (along with C) has mechanisms to directly access
stored bit-patterns.
Francis
Yes, but the standards also contain (new) provisions for garbage
collection of memory, including standard facilities and implementation
rules on when an implementation can assume that it can find and
understand all live pointer values in the running program.

Those specific rules explicitly limit the ways in which conformant
programs are allowed to deal with the stored bit patterns representing
pointer values. And similar to real world garbage collecting memory
managers, these rules also provide ways to distinguish between garbage
collectable/implementation modifiable pointer values and bit bangable
pointer values.


Enjoy

Jakob
--
Jakob Bohm, CIO, Partner, WiseMo A/S. https://www.wisemo.com
Transformervej 29, 2860 Søborg, Denmark. Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded
Tim Rentsch
2015-04-03 21:37:47 UTC
Permalink
Raw Message
Post by e***@gmail.com
I don't understand the intended reconciliation of the two
following rulings from DR 260
(1) the representation of a determinate pointer can change at any time, yet
(2) a determinate value can be "bitwise" (i.e., representationally)
copied using memmove or memcpy to make an equal value.
How is (2) possible given that memmove and memcopy are presumably
operating on the representation, which can change while that
representation is being read? Is the intent that memmove or memcpy
enjoy special atomicity guarantees that cannot be duplicated by
ordinary code?
FWIW here is my take on the situation.

The response to DR 260 is wrongheaded, and logically nonsensical.
It is IMO the single worst decision to come out of WG14. The
best interpretation I can put on it is the people participating
didn't realize, or perhaps didn't appreciate, the ramifications
of what they were proposing.

Because C allows access to the bit representations of objects,
namely using unsigned char, the only scheme that holds water is
if all storage units (what the C standard calls "bytes") remain
in the same bit-state until there is some kind of storing
operation that happens (basically assignment, although ++/--
are also on that list). A storing operation affects the storage
units related to the operation, but must leave all others
untouched and in their previous state[*]. Breaking this rule
results in a system that is logically unsound, or at the
very least /radically different/ from C as it is commonly
understood.

I understand that there are arguments coming from those in favor
of allowing such, let us say, anomalies. What these boil down to
is essentially the same as saying it would be convenient if on
occasion we admit the proposition 2+2=5 as true. The problem is
that admitting such a proposition leads to an unsound logic, with
every predicate statement simultaneously being both true and
false. This situation cannot be allowed if we are to be able
to reason about C programs.


[*] At the hardware level there may be exceptions to this
invariant, but only under the "as if" rule. In the abstract
machine, any storage unit (not counting 'volatile', which is not
pertinent to the discussion) retains its previous bit-state until
there is a storing operation in the C program that changes it.
Actually I suspect that the poor judgment shown in the DR 260
response arose largely out of confusing, or failing to
distinguish between, behavior in an actual machine and behavior
in the abstract machine.
e***@gmail.com
2015-04-13 06:00:46 UTC
Permalink
Raw Message
Thank you all for your replies. Please allow me get back to the original purpose of the question, which was not to ask for opinions on the wisdom of DR260 or the standard, or how they might be fixed, but rather whether anyone might suggest a possible reconciliation to the apparent contradiction that might have been intended by WG14. I don't think anyone has suggested such an explanation that is consistent with the documents, but please correct me if I'm wrong.

Regarding Keith's suggestion that we take seriously the following choice of wording in the standard about the definition of object representation: if you really take this seriously, then you have to decide on what "(e.g., by memcpy)" means. Does it apply if the bytes are copied by user code? If they are copied one byte at a time? If they are copied in reverse order? What if they are copied through some sort of indirect process, such as temporarily storing them in ordinary byte variables? Etc. I doubt that this can be turned into a sensible semantics in a way that really treats the representation as existing only when it is stored. Nevertheless, perhaps this should be turned into a defect report to force the committee to rewrite the language or clarify the intent.

There was also the suggestion that representations should not appear to change without an intervening relevant storage event. This might be reasonable, but seems to certainly disagree with existing compilers, at least on indeterminate values.

The only reasonable, yet possibly intended, interpretation I can currently think of is the following. The only way to get at the representation of a pointer is through type conversion, such as to a byte array or to an uintptr_t. The resulting value is a valid representation of the pointer. (The properties of a valid pointer representation involve other issues that we won't address right now.) Library routines like memcpy really do provide a functionality not achievable by user code, in that it it acts as if it was able to access the entire memory range as a single (nonatomic) memory access (thereby seeing a single representation of an arbitrary object). Conversely, the usual byte-by-byte implementation of copying is not a portable way to copy a pointer.

I confess I would feel rather more comfortable with this resolution if the language allowed freestanding code to effectively implement memcpy. But the standard has other gaps like that in need of fixing, e.g. the inability of freestanding code to implement malloc.

Can anybody provide an informed opinion of whether this is what WG14 had in mind? thanks in advance.
James Kuyper
2015-04-13 12:24:50 UTC
Permalink
Raw Message
Post by e***@gmail.com
Thank you all for your replies. Please allow me get back to the
original purpose of the question, which was not to ask for opinions
on the wisdom of DR260 or the standard, or how they might be fixed,
but rather whether anyone might suggest a possible reconciliation to
the apparent contradiction that might have been intended by WG14. I
don't think anyone has suggested such an explanation that is
consistent with the documents, but please correct me if I'm wrong.
Regarding Keith's suggestion that we take seriously the following
choice of wording in the standard about the definition of object
representation: if you really take this seriously, then you have to
decide on what "(e.g., by memcpy)" means. Does it apply if the bytes
are copied by user code? If they are copied one byte at a time? If
they are copied in reverse order? What if they are copied through
some sort of indirect process, such as temporarily storing them in
ordinary byte variables? Etc. I doubt that this can be turned into a
sensible semantics in a way that really treats the representation as
existing only when it is stored. Nevertheless, perhaps this should be
turned into a defect report to force the committee to rewrite the
language or clarify the intent.
6.2.6.1p4 gives memcpy() as an example of how the bytes that make up the
representation of an object containing a value, but does not mandate the
use of memcpy() for that purpose. To me, it seems obvious that this
implies that a user-written function which meets the semantic
requirements imposed by the standard on memcpy() must also enable the
same guarantee.
The key point of DR. 260, is that so long as an object contains an
indeterminate value, then 6.2.6.1p4 no longer applies, neither to the
object itself, nor to it's individual bytes. I personally have argued
that it should apply to the individual bytes, but DR260 indicates that
the committee officially decided otherwise - that doesn't bother me, so
long as it's clear when 6.2.6.1p4 can and cannot be relied upon, and it
does seem clear to me. I did not read DR260 as suggesting that such
variability was permitted for the individual bytes making up an object
with a determinate value. I believe that such bytes are protected from
arbitrary modification - they may only change when a new value is stored
- either in the same object, or by type punning.
Post by e***@gmail.com
There was also the suggestion that representations should not appear
to change without an intervening relevant storage event. This might
be reasonable, but seems to certainly disagree with existing
compilers, at least on indeterminate values.
DR 260 is consistent with that suggestion, so long as it is restricted
to objects with determinate values.
Post by e***@gmail.com
The only reasonable, yet possibly intended, interpretation I can
currently think of is the following. The only way to get at the
representation of a pointer is through type conversion, such as to a
byte array or to an uintptr_t. ...
The target type for at type conversion must be a scalar type (6.5.4p2)
and therefore cannot be an array type. Conversion to uintptr_t is likely
to result in an integer with the same representation as the pointer, but
the standard imposes no such requirement.
Post by e***@gmail.com
... The resulting value is a valid
representation of the pointer. (The properties of a valid pointer
representation involve other issues that we won't address right now.)
Library routines like memcpy really do provide a functionality not
achievable by user code, in that it it acts as if it was able to
access the entire memory range as a single (nonatomic) memory access
(thereby seeing a single representation of an arbitrary object).
I don't see any support for that claim, neither in the standard nor in
DR 260. memcpy() is given only as an example, and therefore cannot be
the only permissible way of performing the copy described by 6.2.6.1p4.
Post by e***@gmail.com
Conversely, the usual byte-by-byte implementation of copying is not a
portable way to copy a pointer.
Nothing said by DR260 prohibits the use of byte-by-byte copying of a
pointer's representation - so long as it contains a determinate value.
The minute that it contains an indeterminate value, DR260 implies that
memcpy() becomes just as useless a method of copying the pointer's
representation as any other.
Post by e***@gmail.com
I confess I would feel rather more comfortable with this resolution
if the language allowed freestanding code to effectively implement
memcpy. ...
I believe that this is precisely the implication of the use of "e.g." in
6.2.6.1p4 - that when the promise made by that clause applies, memcpy()
is precisely as useful as user-written code meeting the same semantic
requirements as memcpy(), for rendering that clause applicable. When the
value is indeterminate, memcpy() is precisely as useless as equivalent
user-written code for performing the copy.

...
Post by e***@gmail.com
Can anybody provide an informed opinion of whether this is what WG14
had in mind? thanks in advance.
I'm not on the committee, and judging from my exchanges with at least
one person who is on the committee, I'm lousy at reading the minds of
the committee. My responses above are based entirely upon my readings of
the actual text of the current standard and DR260.
--
James Kuyper
Keith Thompson
2015-04-13 15:46:59 UTC
Permalink
Raw Message
[...]
Post by James Kuyper
Post by e***@gmail.com
There was also the suggestion that representations should not appear
to change without an intervening relevant storage event. This might
be reasonable, but seems to certainly disagree with existing
compilers, at least on indeterminate values.
DR 260 is consistent with that suggestion, so long as it is restricted
to objects with determinate values.
[...]

I hadn't read DR 260 that way before. Interesting.

Here's the relevant part of the committee response:

Values may have any bit-pattern that validly represents them
and the implementation is free to move between alternate
representations (for example, it may normalize pointers,
floating-point representations etc.).

In the case of an indeterminate value all bit-patterns are
valid representations and the actual bit-pattern may change
without direct action of the program.

(That's actually one paragraph; I split it for emphasis.)

I find the phrase "move between" terribly unclear; what kind of "motion"
are we talking about? I think this phrase has led some people to think
that objects, even with determinate values, may have their
representations change at arbitrary times. I don't *think* that was
the intent.

One perhaps reasonable interpretation of this is as follows:

Determinate values may have multiple representations. Suppose, for
example, the double value 1.0 has two different representations. This
code:

double x = 1.0;
double y = 1.0;

might store different representations in x and y, detectable by
using memcmp(), for example. But the chosen representation of
each object cannot change at arbitrary times, only when a value
is stored in x. If you copy the bytes of x using memcpy(), and
then copy them again, without updating x, the two target arrays'
elements will compare equal. `x = x` might normalize x, changing
its representation but not its value. And of course `x == x`,
`x == y`, and `y == y` must all yield 1.

Normalization may not occur at arbitrary times.

An object with an indeterminate value:

double z; /* no initialization */

may contain any of 2**64 bit patterns (if double is 64 bits); we know
that even without DR 260. What the DR adds is that the bit pattern may
change arbitrarily, even in the absence of any explicit update to z.

It seems (and is!) bizarre that two successive memcpy() from z can yield
different byte values. The rationale is in the section "Committee
Discussion (for history only):

In reaching our response we noted that requiring immutable bit
patterns for indeterminate values would reduce optimization
opportunities. For example, it would require tracking of the
actual bit-patterns of indeterminate values if the memory
containing them were paged out. That seems an unnecessary
constraint on optimizers with no compensatory benefit to
programmers.

I'm not convinced by this reasoning. It applies only to code that
access the representation of objects with indeterminate values.
If the compiler doesn't know whether an object's value is determinate
or not, the compiler has to assume that it's determinate and
access its bit pattern anyway. If it's known to be indeterminate,
the compiler can reasonably issue a warning; the code is arguably
buggy, and its performance is not particularly relevant. (And the
argument that the bytes themselves are objects whose last stored
values are retained still applies.)

Are there cases where the permission granted by DR 260 is actually
helpful to optimizing compilers? Can it be used to improve the
performance of well-behaved code?
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Jakob Bohm
2015-04-13 16:46:23 UTC
Permalink
Raw Message
Post by Keith Thompson
[...]
Post by James Kuyper
Post by e***@gmail.com
There was also the suggestion that representations should not appear
to change without an intervening relevant storage event. This might
be reasonable, but seems to certainly disagree with existing
compilers, at least on indeterminate values.
DR 260 is consistent with that suggestion, so long as it is restricted
to objects with determinate values.
[...]
I hadn't read DR 260 that way before. Interesting.
Values may have any bit-pattern that validly represents them
and the implementation is free to move between alternate
representations (for example, it may normalize pointers,
floating-point representations etc.).
In the case of an indeterminate value all bit-patterns are
valid representations and the actual bit-pattern may change
without direct action of the program.
(That's actually one paragraph; I split it for emphasis.)
I find the phrase "move between" terribly unclear; what kind of "motion"
are we talking about? I think this phrase has led some people to think
that objects, even with determinate values, may have their
representations change at arbitrary times. I don't *think* that was
the intent.
Determinate values may have multiple representations. Suppose, for
example, the double value 1.0 has two different representations. This
double x = 1.0;
double y = 1.0;
might store different representations in x and y, detectable by
using memcmp(), for example. But the chosen representation of
each object cannot change at arbitrary times, only when a value
is stored in x. If you copy the bytes of x using memcpy(), and
then copy them again, without updating x, the two target arrays'
elements will compare equal. `x = x` might normalize x, changing
its representation but not its value. And of course `x == x`,
`x == y`, and `y == y` must all yield 1.
Normalization may not occur at arbitrary times.
double z; /* no initialization */
may contain any of 2**64 bit patterns (if double is 64 bits); we know
that even without DR 260. What the DR adds is that the bit pattern may
change arbitrarily, even in the absence of any explicit update to z.
It seems (and is!) bizarre that two successive memcpy() from z can yield
different byte values. The rationale is in the section "Committee
In reaching our response we noted that requiring immutable bit
patterns for indeterminate values would reduce optimization
opportunities. For example, it would require tracking of the
actual bit-patterns of indeterminate values if the memory
containing them were paged out. That seems an unnecessary
constraint on optimizers with no compensatory benefit to
programmers.
I'm not convinced by this reasoning. It applies only to code that
access the representation of objects with indeterminate values.
If the compiler doesn't know whether an object's value is determinate
or not, the compiler has to assume that it's determinate and
access its bit pattern anyway. If it's known to be indeterminate,
the compiler can reasonably issue a warning; the code is arguably
buggy, and its performance is not particularly relevant. (And the
argument that the bytes themselves are objects whose last stored
values are retained still applies.)
Are there cases where the permission granted by DR 260 is actually
helpful to optimizing compilers? Can it be used to improve the
performance of well-behaved code?
How about this classic:

void foo(void) {
double z;
long long y;

y = 1;
/* Lot's of code not assigning to z */

/* Lot's of code accessing z, but not accessing y */
}

In this example, classic optimizer live-range analysis can conclude
that z need not exist during the first piece of elided code (because
z has no determinable value) and that y need not exist during the
second piece of elided code (because y is not accessed at all).

Therefore the compiler can generate code that stores y and z in the
same register or stack location.

Put another way, this says that live-range analysis implementations
need not consider reading a variable as starting a live-range for
that variable, only as extending any already existing live range.

As for an example with well-behaved code, consider something like
this:

double z;
do {
z *= 2.0;

if (someconditiontoocomplexfortheoptimizertounderstand)
z = 1.0;
} while (condition);

/* z is now pow(2, number of iterations between
* someconditiontoocomplexfortheoptiomizertounderstand and
* condition becoming false;
* If someconditiontoocomplexfortheoptimizertounderstand
* was never true, z is bogus, and further code below will
* deal with that, or there is a proof that this won't happen.
*/

Here the rule allows the optimizer to behave as-if the value of z
doubled can only be from an execution of z = 1.0, and thus generate code
that only works for powers of 2.0, even if the programmer made
a mistake that would cause
someconditiontoocomplexfortheoptimizertounderstand to never be true
before condition becomes false. And it allows this optimization
even if the correctness of the optimization requires a lengthy proof
that the optimizer cannot do on its own, simply because the optimizer
is allowed to pass bogus values if the programmer made a mistake.

As a programmer I am often annoyed by compilers issuing warnings
about the possibly use of an unassigned z in such code, when I
can prove that the conditions make this impossible. By explicitly
allowing the optimizer to care less about such cases, the cost of
such optimizer limitations become much less problematic.


Enjoy

Jakob
--
Jakob Bohm, CIO, Partner, WiseMo A/S. https://www.wisemo.com
Transformervej 29, 2860 Søborg, Denmark. Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded
James Kuyper
2015-04-13 21:11:55 UTC
Permalink
Raw Message
On 04/13/2015 12:46 PM, Jakob Bohm wrote:
...
Post by Jakob Bohm
As for an example with well-behaved code, consider something like
double z;
do {
z *= 2.0;
Well-behaved? Regardless of the values of the two options tested below,
the first pass through this loop always reads the uninitialized value of
z, which might be a trap representation, or z > DBL_MAX/2 or z <
-DBL_MAX/2. If you had moved that line to an else-clause for the first
if() clause, it would make a lot more sense.
Post by Jakob Bohm
if (someconditiontoocomplexfortheoptimizertounderstand)
z = 1.0;
} while (condition);
/* z is now pow(2, number of iterations between
* someconditiontoocomplexfortheoptiomizertounderstand and
* condition becoming false;
* If someconditiontoocomplexfortheoptimizertounderstand
* was never true, z is bogus, and further code below will
* deal with that, or there is a proof that this won't happen.
*/
Here the rule allows the optimizer to behave as-if the value of z
doubled can only be from an execution of z = 1.0, and thus generate code
that only works for powers of 2.0, even if the programmer made
a mistake that would cause
someconditiontoocomplexfortheoptimizertounderstand to never be true
before condition becomes false. And it allows this optimization
even if the correctness of the optimization requires a lengthy proof
that the optimizer cannot do on its own, simply because the optimizer
is allowed to pass bogus values if the programmer made a mistake.
As a programmer I am often annoyed by compilers issuing warnings
about the possibly use of an unassigned z in such code, when I
can prove that the conditions make this impossible. ...
Would you care to provide the steps of that proof? Just offhand, I don't
see how it's possible that your code doesn't use an unassigned value of z.
--
James Kuyper
Jakob Bohm
2015-04-13 23:01:14 UTC
Permalink
Raw Message
Post by James Kuyper
...
Post by Jakob Bohm
As for an example with well-behaved code, consider something like
double z;
do {
z *= 2.0;
Well-behaved? Regardless of the values of the two options tested below,
the first pass through this loop always reads the uninitialized value of
z, which might be a trap representation, or z > DBL_MAX/2 or z <
-DBL_MAX/2. If you had moved that line to an else-clause for the first
if() clause, it would make a lot more sense.
Post by Jakob Bohm
if (someconditiontoocomplexfortheoptimizertounderstand)
z = 1.0;
} while (condition);
/* z is now pow(2, number of iterations between
* someconditiontoocomplexfortheoptiomizertounderstand and
* condition becoming false;
* If someconditiontoocomplexfortheoptimizertounderstand
* was never true, z is bogus, and further code below will
* deal with that, or there is a proof that this won't happen.
*/
Here the rule allows the optimizer to behave as-if the value of z
doubled can only be from an execution of z = 1.0, and thus generate code
that only works for powers of 2.0, even if the programmer made
a mistake that would cause
someconditiontoocomplexfortheoptimizertounderstand to never be true
before condition becomes false. And it allows this optimization
even if the correctness of the optimization requires a lengthy proof
that the optimizer cannot do on its own, simply because the optimizer
is allowed to pass bogus values if the programmer made a mistake.
As a programmer I am often annoyed by compilers issuing warnings
about the possibly use of an unassigned z in such code, when I
can prove that the conditions make this impossible. ...
Would you care to provide the steps of that proof? Just offhand, I don't
see how it's possible that your code doesn't use an unassigned value of z.
Oops, sorry I was thinking in an exception-free floating point model,
many compilers can be configured to operate like that. In such a
model/mode, multiplying junk will result in junk, which is ok as long
as that junk is later discarded.

If you insist on a model where floating point operations can generate
signals, exceptions or other such surprises, replace double by long,
and consider machines such as System/360 where 0 has two valid
encodings. Or the example could be changed to guard the multiply
by a third condition too complex for the optimizer to understand.

Now the hypothetical proof involves the specifics of the hypothetical
conditions present and can involve anything the optimizer in question
doesn't try to prove itself, which in turn varies from compiler to
compiler. It could also involve program invariants and/or
higher math. For instance, the loop could be walking a linked list
and calculating the number of subsets (with repeats) of a subrange
of that list.

Regardless of the specifics, it would obviously involve the two
hypothetical conditions having side effects or involving variables
other than z.


Enjoy

Jakob
--
Jakob Bohm, CIO, Partner, WiseMo A/S. https://www.wisemo.com
Transformervej 29, 2860 Søborg, Denmark. Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded
James Kuyper
2015-04-14 14:50:20 UTC
Permalink
Raw Message
Post by Jakob Bohm
Post by James Kuyper
...
Post by Jakob Bohm
As for an example with well-behaved code, consider something like
double z;
do {
z *= 2.0;
Well-behaved? Regardless of the values of the two options tested below,
the first pass through this loop always reads the uninitialized value of
z, which might be a trap representation, or z > DBL_MAX/2 or z <
-DBL_MAX/2. If you had moved that line to an else-clause for the first
if() clause, it would make a lot more sense.
Post by Jakob Bohm
if (someconditiontoocomplexfortheoptimizertounderstand)
z = 1.0;
} while (condition);
...
Post by Jakob Bohm
Post by James Kuyper
Post by Jakob Bohm
As a programmer I am often annoyed by compilers issuing warnings
about the possibly use of an unassigned z in such code, when I
can prove that the conditions make this impossible. ...
Would you care to provide the steps of that proof? Just offhand, I don't
see how it's possible that your code doesn't use an unassigned value of z.
Oops, sorry I was thinking in an exception-free floating point model,
many compilers can be configured to operate like that. In such a
model/mode, multiplying junk will result in junk, which is ok as long
as that junk is later discarded.
You said that you could "prove that the conditions make [use of an
unassigned z] impossible", which is something quite different from
saying that "use of unassigned z is safe".
Making such use safe is perfectly compatible with the C standard saying
that the behavior is undefined. But a compiler that does so should
probably treat this as a less serious problem than a compiler that
implements it in such a way that it can fail catastrophically. The only
reason it should treat it as a problem at all, is to warn you about the
unportability of the assumptions you're making when writing such code.
Jakob Bohm
2015-04-14 15:58:23 UTC
Permalink
Raw Message
Post by James Kuyper
Post by Jakob Bohm
Post by James Kuyper
...
Post by Jakob Bohm
As for an example with well-behaved code, consider something like
double z;
do {
z *= 2.0;
Well-behaved? Regardless of the values of the two options tested below,
the first pass through this loop always reads the uninitialized value of
z, which might be a trap representation, or z > DBL_MAX/2 or z <
-DBL_MAX/2. If you had moved that line to an else-clause for the first
if() clause, it would make a lot more sense.
Post by Jakob Bohm
if (someconditiontoocomplexfortheoptimizertounderstand)
z = 1.0;
} while (condition);
...
Post by Jakob Bohm
Post by James Kuyper
Post by Jakob Bohm
As a programmer I am often annoyed by compilers issuing warnings
about the possibly use of an unassigned z in such code, when I
can prove that the conditions make this impossible. ...
Would you care to provide the steps of that proof? Just offhand, I don't
see how it's possible that your code doesn't use an unassigned value of z.
Oops, sorry I was thinking in an exception-free floating point model,
many compilers can be configured to operate like that. In such a
model/mode, multiplying junk will result in junk, which is ok as long
as that junk is later discarded.
You said that you could "prove that the conditions make [use of an
unassigned z] impossible", which is something quite different from
saying that "use of unassigned z is safe".
Making such use safe is perfectly compatible with the C standard saying
that the behavior is undefined. But a compiler that does so should
probably treat this as a less serious problem than a compiler that
implements it in such a way that it can fail catastrophically. The only
reason it should treat it as a problem at all, is to warn you about the
unportability of the assumptions you're making when writing such code.
If you didn't get it: I was trying to present an entire class of
examples via a simplified example. One subset of that class involves
the uninitialized value being read only under impossible conditions,
another subset involves the uninitialized value being used only for
operations of no consequence.

In constructing the example (but not the concept), I accidentally
used an example which would not work with signalling floating point
operations. Also the original mail was rewritten several times to
simplify the example, which is how the word "impossible" slipped
through where it should have said "of no consequence".

So to reiterate: Optimizer live range analysis in language
implementations often cannot detect that some combinations of
paths through the code are impossible. This has two negative
consequences:

1. The optimizer wastes code or storage to preserve unused
values of uninitialized variables. Keith's DR 260
interpretation says the optimizer doesn't have to do that,
thus allowing it to generate more optimal code.

2. The optimizer emits warnings about use of uninitialized
variables in impossible or harmless code paths. A
traditional way to silence such warnings is to initialize
the affected variables with dummy values such as 0 or 1,
which would unfortunately nullify the permission to not
actually store the (previously) uninitialized values.


Enjoy

Jakob
--
Jakob Bohm, CIO, Partner, WiseMo A/S. https://www.wisemo.com
Transformervej 29, 2860 Søborg, Denmark. Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded
Tim Rentsch
2015-04-16 07:54:46 UTC
Permalink
Raw Message
Thank you all for your replies. Please allow me get back to
the original purpose of the question, which was not to ask for
opinions on the wisdom of DR260 or the standard, or how they
might be fixed, but rather whether anyone might suggest a
possible reconciliation to the apparent contradiction that
might have been intended by WG14. I don't think anyone has
suggested such an explanation that is consistent with the
documents, but please correct me if I'm wrong. [...snip...]
Let me suggest a way of looking at this that may be more
satisfying for you. First please re-read the Committee Response
to Question 1 in the DR.

(Incidentally, let me also suggest that you limit your message
lines to somewhere between 65 and 70 characters, making sure they
are wrapped before you post them. My newsreader does this
automatically, yours may require some sort of option setting to
bring this into effect.)

The sentences in the Committee Reponse to Question 1 could be
taken as saying that (a) determinate values may choose any legal
representation (for the abstract value being stored) when they
are stored, but the bits don't change in between store
operations, and (b) indeterminate values may change their
bit-state at any time, whether store operations happen or not.

This would mean that memcpy(), etc, are not special when used
on determinate values, and ordinary user-written character
copying could be used instead. That would not apply in the case
of indeterminate values, but who cares about those anyway? :)

This reading is the most benign one I can see (at least for now)
for DR260, and I believe it is plausible, ie, what was intended.
Of course I'm not sure about the last part, but I believe the
probability is at least, say, 20%.

There are still other issues regarding, eg, exactly which
objects are considered indeterminate, and when, and what
rules apply for cases when the same bit-pattern represents
different abstract values (which certainly happens for
pointers). However these issues aren't what you asked
about.
Can anybody provide an informed opinion of whether this is what
WG14 had in mind? thanks in advance.
Based on what little I know of the dynamics of WG14, my guess
is you would get at least two different answers if the
various members were asked separately. I think I have a
reasonably good read on which questions they are pretty
uniform on and which questions the answers are more fuzzy,
and this issue is definitely more towards the fuzzy end
of the spectrum, uniformity-wise.

Loading...