Discussion:
Clarifying Indeterminate vs Unspecified values
(too old to reply)
s***@casperkitty.com
2016-10-30 18:29:00 UTC
Permalink
Raw Message
The Standard presently presently two kinds of objects whose value is
unknown: those which have Unspecified values are guaranteed to hold
bit patterns that are not trap representations, and those which have
Indeterminate Value, which lack such a guarantee. The Standard is
rather murky about what, if anything, is guaranteed about objects which
hold Indeterminate Value in cases where both of the following conditions
apply:

1. Every possible combination of bit values would represent a valid
value for the type.

2. The object is not of automatic duration, or else has its address
taken.

From what I've seen, some people would like to say that Unspecified Value
and Indeterminate Value have essentially the same meaning when the above
conditions apply. Others would like to treat Indeterminate Value as a
symbolic concept such that if x holds Indeterminate value, almost any
expression involving x--including expressions like "x & 15"--would yield
Indeterminate Value.

I would regard the first reading as being overly restrictive on the kinds
of optimizations compilers could perform, but the second would be overly
restrictive on the kinds of optimizations programmers could safely *let*
compilers perform. I would propose that it would be more helpful to
define a middle ground which would grant compilers considerable freedom,
but also offer enough behavioral guarantees that programmers could exploit
algorithms which don't require any particular initial state.

To allow maximum compatibility with existing code, I would suggest that
there should be four behavioral models, specified via some kind of #pragma.
Any implementation which satisfies an earlier model would also satisfy
the requirements of all later models, so all but the first model would be
optional [if code requested a model an implementation didn't provide for,
it could substitute an earlier one]. It would be highly recommended that
compilers support the third, since it offers more optimization opportunities
than the first two without losing any semantic expressiveness, and that new
programs be written to be compatible with it for the same reasons.

Models:

1. Simple unspecified bits: An object holding Indeterminate Value will
behave as though the underlying storage holds some bit pattern
which--once observed--will not change unless the object is written.

2. Randomly-changing bits, read deterministically: each read of an
object holding Indeterminate Value will behave as though it held
some combination of bits, but the bit pattern may change arbitrarily
between reads.

3. Constrained non-deterministic model: Certain operations are defined
as yielding a definite combination of bits (as with #2) but others
may, at a compiler's leisure, yield non-determinstic sets of
possibilities, as described later.

4. Competely-loose model: Nothing useful can be said about the result
of working with Indeterminate Values.

The non-deterministic model can be described either in behavioral terms or
code-transformation terms; the former makes it easier to reason about what
a given source text might do, while the latter makes it easier to reason
about what a compiler would be allowed to do with a given source text.

I'll describe the models for #3 in a later post, but first I'll ask:
do people agree with my assessment of the present state of affairs and
the lack of clarity in the Standard, and the desirability of having a
more concrete spec for behavior which allows optimizations but does not
require programmers to spend time initializing objects in cases where
any possible initial bit pattern would allow a program to meet
requirements?
j***@verizon.net
2016-10-30 23:36:26 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
The Standard presently presently two kinds of objects whose value is
unknown: those which have Unspecified values are guaranteed to hold
bit patterns that are not trap representations, and those which have
Indeterminate Value, which lack such a guarantee. The Standard is
rather murky about what, if anything, is guaranteed about objects which
hold Indeterminate Value in cases where both of the following conditions
1. Every possible combination of bit values would represent a valid
value for the type.
2. The object is not of automatic duration, or else has its address
taken.
From what I've seen, some people would like to say that Unspecified Value
and Indeterminate Value have essentially the same meaning when the above
conditions apply. Others would like to treat Indeterminate Value as a
symbolic concept such that if x holds Indeterminate value, almost any
expression involving x--including expressions like "x & 15"--would yield
Indeterminate Value.
I think you are confusing two different (but closely related) ways to get undefined behavior.

If the value of an object is indeterminate for any reason, one of the
possibilities that are allowed by the definition of "indeterminate value"
is that it might have a trap representation. (3.19.2p1) This does NOT
applied to unspecified values (3.19.3p1). If an object does have a trap
representation, then the behavior of trying to read that value is
undefined. (6.2.6.1p5) However, the representation of all types is
implementation defined, which includes whether or not they have trap
representations, and certain types are prohibited by the standard from
having trap representations. If using a type that is either prohibited from
having a trap representation, or which doesn't have any trap representations
on the implementation of C being used, undefined behavior is not allowed
just because the code reads an inderminate value.

However, there is another clause about uninitialized objects. Uninitialized
objects have indeterminate values, but reading them is undefined because of
a clause completely independent of the clause which says that indeterminate values can have trap representations. Therefore, that clause applies even to types which are incapable of having trap representations:

"If the lvalue designates an object of automatic storage duration that could have been declared with the register storage class (never had its address taken), and that object is uninitialized (not declared with an
initializer and no assignment to it has been performed prior to use), the
behavior is undefined." (6.3.2.1p2)

Your condition #2 is sufficient to guarantee that 6.3.2.1p2 never applies.
It seems an odd condition to specify unless you were aware of 6.3.2.1p2. If
you were aware of that clause, why didn't you make the proper disctinction
between indeterminate values and uninitialized objects. They're two
different, though overlapping, issues.
s***@casperkitty.com
2016-10-31 14:33:07 UTC
Permalink
Raw Message
Post by j***@verizon.net
Your condition #2 is sufficient to guarantee that 6.3.2.1p2 never applies.
It seems an odd condition to specify unless you were aware of 6.3.2.1p2. If
you were aware of that clause, why didn't you make the proper disctinction
between indeterminate values and uninitialized objects. They're two
different, though overlapping, issues.
Taking the address of an uninitialized variable and then reading it is
the simplest way, syntactically, to read an Indeterminate Value under
the present C Standard. I think that reading a value which was last
written as an incompatible type should also yield Indeterminate Value
rather than yielding UB, but that would be an expensive guarantee *unless*
Indeterminate Values were allowed to behave non-deterministically (in
which case it would be a very cheap guarantee).
j***@verizon.net
2016-11-01 02:19:29 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
Post by j***@verizon.net
Your condition #2 is sufficient to guarantee that 6.3.2.1p2 never applies.
It seems an odd condition to specify unless you were aware of 6.3.2.1p2. If
you were aware of that clause, why didn't you make the proper disctinction
between indeterminate values and uninitialized objects. They're two
different, though overlapping, issues.
Taking the address of an uninitialized variable and then reading it is
the simplest way, syntactically, to read an Indeterminate Value under
the present C Standard.
The simplest way to read an indeterminate value is to do so directly, without
bothering to take it's address:

unsigned char uninitialized_variable;
uninitialized_variable;

In principle, an expression statement consisting entirely of a variable name
requires that the value of that variable be read, even though nothing is done
with it. Of course, Realistically, since that expression has no observable
effects, an implementation would be free to optimize it out of existence, and
most would. If that worries you, one way to make sure it's not optimized out is
to create some observable effects.

printf("%c", uninitialized_variable);

This code has undefined behavior, simply because uninitialized_variable is
uninitialized (6.3.2.1p2), and NOT because it's indeterminate value might have
a trap representation - unsigned char is, in fact, prohibited from having trap
representations.

All of the other methods of creating indeterminate values (there's quite a few
of them) have undefined behavior only if the value is a trap representation,
and are therefore safe when applied to a type that doesn't have any.
s***@casperkitty.com
2016-11-01 03:49:01 UTC
Permalink
Raw Message
Post by j***@verizon.net
Post by s***@casperkitty.com
Taking the address of an uninitialized variable and then reading it is
the simplest way, syntactically, to read an Indeterminate Value under
the present C Standard.
The simplest way to read an indeterminate value is to do so directly, without
unsigned char uninitialized_variable;
uninitialized_variable;
The Standard makes very clear that nothing is guaranteed about the behavior
of a program which reads an uninitialized variable whose address is not
taken. Presumably the intention is that *something* is guaranteed about
the behavior of a program which takes the address of an uninitialized
automatic variable and then reads it, or else there would be no purpose
to distinguishing the scenarios where the address is or is not taken.
Post by j***@verizon.net
This code has undefined behavior, simply because uninitialized_variable is
uninitialized (6.3.2.1p2), and NOT because it's indeterminate value might have
a trap representation - unsigned char is, in fact, prohibited from having trap
representations.
It would not be UB if the address were taken.
Post by j***@verizon.net
All of the other methods of creating indeterminate values (there's quite a few
of them) have undefined behavior only if the value is a trap representation,
and are therefore safe when applied to a type that doesn't have any.
Given:

#include <stdio.h>
#include <stdint.h>
int main(void)
{
uint32_t x;
&x; // Take the address so use of x won't be UB
uint32_t y=x, z=x;
printf("%d %d %d\n", x==y, y==z, (x & 3) > 4);
return 0;
}

The type uint32_t is forbidden from having trap representations, and the
address of variable x is taken. Consequently the program does not invoke
UB. What can be guaranteed about the output of the above code snippet?
j***@verizon.net
2016-11-01 15:19:15 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
Post by j***@verizon.net
Post by s***@casperkitty.com
Taking the address of an uninitialized variable and then reading it is
the simplest way, syntactically, to read an Indeterminate Value under
the present C Standard.
The simplest way to read an indeterminate value is to do so directly, without
unsigned char uninitialized_variable;
uninitialized_variable;
The Standard makes very clear that nothing is guaranteed about the behavior
of a program which reads an uninitialized variable whose address is not
taken. Presumably the intention is that *something* is guaranteed about
the behavior of a program which takes the address of an uninitialized
automatic variable and then reads it, or else there would be no purpose
to distinguishing the scenarios where the address is or is not taken.
Post by j***@verizon.net
This code has undefined behavior, simply because uninitialized_variable is
uninitialized (6.3.2.1p2), and NOT because it's indeterminate value might have
a trap representation - unsigned char is, in fact, prohibited from having trap
representations.
It would not be UB if the address were taken.
OK - so you actually meant to say "The simplest way to read an indeterminate value with defined behavior". But, if that's what you wanted, you needed to impose additional requirements: that the type of the variable should be one for which there are no trap representations. It's not sufficient to avoid the undefined behavior allowed by 6.3.2.1p2; it's also necessary to avoid the undefined behavior allowed by 6.2.6.1p5.
Post by s***@casperkitty.com
Post by j***@verizon.net
All of the other methods of creating indeterminate values (there's quite a few
of them) have undefined behavior only if the value is a trap representation,
and are therefore safe when applied to a type that doesn't have any.
#include <stdio.h>
#include <stdint.h>
int main(void)
{
uint32_t x;
&x; // Take the address so use of x won't be UB
uint32_t y=x, z=x;
printf("%d %d %d\n", x==y, y==z, (x & 3) > 4);
return 0;
}
The type uint32_t is forbidden from having trap representations, and the
address of variable x is taken. Consequently the program does not invoke
UB. What can be guaranteed about the output of the above code snippet?
The standard guarantees that an object retains it's last stored value
throughout it's lifetime (6.2.4p2). That guarantee is meaningless until the
first time a value is stored in the object. so an implementation is free to
use the memory set aside for an object, for some other purpose instead, right
up until the first time a value is stored in the object. The code above never
stores a value in x. so 6.2.4p2 doesn't apply. Therefore, the value stored in
that location is allowed to change between each reading. It is therefore not
guaranteed that x==y or x==z or, for that matter, that x==x.

The value of x is indeterminate, and because uint32_t is not allowed to have trap representations, that's equivalent to saying that it's unspecified, which means it must be a valid value of it's type. Therefore, it is guaranteed that those equality comparison expressions will return either 0 or 1, and that (x&3) is less than 4, so the last argument of the printf() call is guaranteed to have the value of 0.

The key thing is that you're never supposed to do anything with indeterminate
values other than replace them with determinate values. It has never been the
intention of the committee to encourage the use of such values, so they are not
going to bother adding any more specification to the C standard about what
happens if you do.
s***@casperkitty.com
2016-11-01 16:01:22 UTC
Permalink
Raw Message
Post by j***@verizon.net
The value of x is indeterminate, and because uint32_t is not allowed to have trap representations, that's equivalent to saying that it's unspecified, which means it must be a valid value of it's type. Therefore, it is guaranteed that those equality comparison expressions will return either 0 or 1, and that (x&3) is less than 4, so the last argument of the printf() call is guaranteed to have the value of 0.
Some people have been suggesting that they think compilers should be allowed
to say that computations performed on Indeterminate Values with no trap
representations should be allowed to yield Indeterminate Results. There are
some cases where that would allow some useful optimization, but only if the
semantics of Indeterminate Values remain sufficiently well defined that code
can afford to perform computations on them [in cases where result of the
computation would not affect the correctness of the program].
Post by j***@verizon.net
The key thing is that you're never supposed to do anything with indeterminate
values other than replace them with determinate values. It has never been the
intention of the committee to encourage the use of such values, so they are not
going to bother adding any more specification to the C standard about what
happens if you do.
In cases where the natural machine code to perform some task would meet
requirements whether or not an object or some portion thereof was
initialized, it makes sense not to require that the object be initialized.
If reading indeterminate values of types which support trap representations
were required to be defined behavior, that would require compilers to
pre-initialize all new objects of such types unless they could prove that
they would be written before the first time they are read. On the flip
side, however, if an algorithm would work just fine whether or not some
portion of an object was initialized (e.g. because the object is used to
hold a value which is hard to compute but easy to check), being able to
treat the storage as holding Unspecified values may greatly improve
performance. I don't think the authors of the Standard intended that
implementations only work usably if programmers explicitly clear storage
whose values wouldn't matter.

As a simple example, a technique for emulating sparse arrays that need to
quickly be initialized to a default value is to use three physical arrays
in code, which I'll call indices [unsigned], locations [unsigned], and
values [data type of interest], along with a single unsigned integer count,
such that the count reports the number of distinct indices written. For
each index that has been written, all of the following hold:

locations[index] < count
indices[locations[index]] == index
values[locations[index]] == last value written

For each index that has not been written, at least one of the following
holds:

locations[index] >= count or
indices[locations[index]] != index

If code can read any value of locations and get an unsigned number, it
will be able to determine whether that number represents the location of
the value associated with that index, or whether it does not. All that
will be necessary to initialize the data structure is to set "count" to
zero. Even if it turns out the data structure is used to process only
a dozen look-ups before it's abandoned, little time will have been wasted
on initialization. By contrast, if it had been necessary to clear out an
entire array before use and 99.99% of the elements never got used at all,
such initialization would represent a much greater waste of time.
Loading...