Repeating the question stated in the Subject line:
Does reading an uninitialized object [always] have undefined
behavior?
Background: Annex J part 2 says (in various phrasings in
different revisions of the C standard, with the one below
being taken from C90):
The value of an uninitialized object that has automatic
storage duration is used before a value is assigned [is
undefined behavior] (6.5.7)
Remembering that Annex J is informative rather than normative,
is this statement right even for a type that has no trap
representations? To ask that question another way, is this
statement always right or is it just a (perhaps useful)
approximation?
I think this question can be answered convincingly by reviewing
the subject's history in each revision of the ISO C standard.
We start in C90.
In C90 reading the value of an uninitialized object is always
undefined behavior (and that includes malloc()ed storage as well
as automatic storage duration objects). The C90 standard says,
in 6.5.7:
If an object that has automatic storage duration is not
initialized explicitly, its value is indeterminate.
and in 7.10.3.3:
The malloc function allocates space for an object whose size
is specified by size and whose value is indeterminate.
The term "indeterminate" is not defined in C90, but accessing
storage that is indeterminate is explicitly undefined behavior.
Indeed such uses are part of the /definition/ of undefined
behavior - C90 says in 3.16 (which is an entry in Definitions):
undefined behavior: Behavior, upon use of a nonportable or
erroneous program construct, of erroneous data, or of
indeterminately valued objects, for which this International
Standard imposes no requirements.
So for C90 we have a clear answer: always undefined behavior for
accessing any uninitialized object.
Unfortunately the C90 scheme has some serious issues. There is
no exception for reading using a character type. More seriously,
although C90 gives some situations that cause values to be
indeterminate, it doesn't say anything about making them /not/
be indeterminate. We can guess (but only guess) that assigning
a value to the object as a whole removes indeterminate-ness, but
what about these cases (and other similar ones):
int x;
*(char*)&x = 0;
// is the value of x now indeterminate or not?
struct { int x, y; } s;
s.x = 0;
// is the value of s now indeterminate or not?
Again, we can make guesses about what these answers should be,
but the C90 standard doesn't say. Clearly C90 has some
significant deficiencies.
Next we look at C99.
(Actually, before we do that, I should mention that C90 was
amended and corrected in 1994, 1995, and 1996, by the three
intermediate documents ISO/IEC 9899/COR1, ISO/IEC 9899/AMD1, and
ISO/IEC 9899/COR2. As far as I am aware these revisions have no
bearing on the matter at hand.)
The C99 standard represents a substantial revision and expansion
of the C90 standard. The relationship between uninitialized
memory and undefined behavior is nearly completely rewritten, and
also made more concrete. There's lots to look at here. Starting
at the top, the definition of undefined behavior is revised not
to give any mention of indeterminately valued objects. Here is
section 3.4.3 paragraph 1:
undefined behavior
behavior, upon use of a nonportable or erroneous program
construct or of erroneous data, for which this International
Standard imposes no requirements
(Incidentally the section and paragraph references given in this
part of the discussion are relative to the ISO N1256 document.)
The next most prominent change is that "indeterminate value" is
explicitly defined, in section 3.17.2 paragraph 1:
indeterminate value
either an unspecified value or a trap representation
This definition makes use of two new terms, "unspecified value"
and "trap representation", that were not used in C90. The term
unspecified value is defined immediately following, in 3.17.3 p1:
unspecified value
valid value of the relevant type where this International
Standard imposes no requirements on which value is chosen in
any instance
There is also an informative note in p2:
NOTE An unspecified value cannot be a trap representation.
The term "trap representation" is defined in 6.2.6.1 p5:
Certain object representations need not represent a value of
the object type. If the stored value of an object has such a
representation and is read by an lvalue expression that does
not have character type, the behavior is undefined. If such
a representation is produced by a side effect that modifies
all or any part of the object by an lvalue expression that
does not have character type, the behavior is undefined.41)
Such a representation is called a /trap representation/.
The slant characters around "trap representation" indicate
italics, which the C standard uses to denote a term being
defined. Also there is a '41)' footnote reference
41) Thus, an automatic variable can be initialized to a trap
representation without causing undefined behavior, but the
value of the variable cannot be used until a proper value is
stored in it.
which underscores the non-undefined-behavior aspect of using
character types to change the object representation (and hence
the value) of an object.
The C99 text doesn't use the term "trap representation" very
often. There are several cases where certain types are ruled out
from having trap representations; a few cases where a result
/might be/ a trap representation; and a case involving integer
types where there is an implementation-defined choice as to
whether a specific combination of value bits is a valid value or
a trap representation. Also, in Annex J part 2, the list of
undefined behaviors, there are these summary items:
A trap representation is read by an lvalue expression that
does not have character type (6.2.6.1).
A trap representation is produced by a side effect that
modifies any part of the object using an lvalue expression
that does not have character type (6.2.6.1).
which of course correspond directly to what is said in the
definition of trap representation. Based on various passages in
section 6.2.6, which describes the representation of types, we
can deduce that for some integer types all bit combinations must
be a valid value, and so no trap representations are possible for
those types. Such types always include 'unsigned char', and may
also include other integer types depending on the size of the
type, the value of CHAR_BIT, and the values given in <limits.h>
for the range of the type in question. (More concretely, if the
set of distinct values for type T has 2**(sizeof(T)*CHAR_BIT)
elements, then all object representations are valid values, and
thus type T cannot have any trap representations.)
There are three points worth mentioning regarding unspecified
values and trap representations. One is that unspecified values
are always valid values, and never by themselves cause undefined
behavior. Two is that the distinction between an unspecified
value and a trap representation depends on the type used to
access the object. Three is that, once we know the type of an
access, whether a given object holds a valid value or a trap
representation depends only on the bits and bytes that make up
the object representation of the object, and in particular not on
any hidden "magic" state associated with the object. (There is
one case though that deserves a closer look, which is explained
further on.)
The rule for trap representations is simple and clear: any
access of an object whose object representation is a trap
representation of the access's type is undefined behavior, and
this consequence is accurately portrayed in Annex J part 2.
Having settled the question for trap representations, how about
indeterminate values?
Ruling out the definition and an entry in the index, the term
"indeterminate value" (or values plural) appears in just six
places in the C99 standard: three in informative passages
(usually examples), and three normative passages, those being
6.7.8 paragraph 9 (about unnamed members), 6.8 paragraph 3 (about
declarations for objects with automatic storage duration), and
7.20.3.4 paragraph 2 (about bytes added by a call to realloc()).
The sentence in 6.8 paragraph 3 deserves quoting:
The initializers of objects that have automatic storage
duration, and the variable length array declarators of
ordinary identifiers with block scope, are evaluated and the
values are stored in the objects (including storing an
indeterminate value in objects without an initializer) each
time the declaration is reached in the order of execution, as
if it were a statement, and within each declaration in the
order that declarators appear.
Section 7 has many places where the word "indeterminate" appears
without being followed by "value". I think most of these can be
safely skipped over, but the description of malloc() deserves
quoting (it is 7.20.3.3 paragraph 2):
The malloc function allocates space for an object whose size
is specified by size and whose value is indeterminate.
Presumably the sentence here is meant to express the same idea
as the parallel passage describing the results from realloc(),
which says (in 7.20.3.4 paragraph 2):
Any bytes in the new object beyond the size of the old object
have indeterminate values.
The word "indeterminate" without being followed by "value"
is used in just six other places in the standard: five in the
main body (all of which are part of normative text), plus one
entry in Annex J part 2 (which is of course informative). The
normative uses may be seen to be in two categories, as follows.
Four of the five normative uses are basically restatements of the
long sentence from 6.8 paragraph 3; they are in 6.2.4 paragraph 5
(two uses) and paragraph 6, and 6.7.8 paragraph 10. Here are
excerpts showing these four occurrences (all of which refer to
objects with automatic storage duration):
The initial value of the object is indeterminate.
[if an object had no initializer] the value becomes
indeterminate each time the declaration is reached.
The initial value of the object is indeterminate.
If an object that has automatic storage duration is not
initialized explicitly, its value is indeterminate.
Although these passages use different phrasing, it seems clear
they are meant to mirror the parenthetical phrase in 6.8 p3,
"storing an indeterminate value in objects without an
initializer"; presumably the difference in phrasing simply
reflects the styles of the respective sections: 6.8 gives an
imperative description, whereas 6.2.4 and 6.7 tend to be more
declarative in style. (The last of these excerpts matches
word-for-word with the analogous sentence in C90.) That the C99
standard considers these five passages as expressing the same
idea can be seen by them all being referenced in a single entry
given in Annex J part 2:
The value of an object with automatic storage duration is
used while it is indeterminate (6.2.4, 6.7.8, 6.8).
Compare this text with the corresponding entry in C90. One
reason for the difference is that in C99, unlike in C90, an
object can become "unassigned" after it is first assigned (which
is a consequence in C99 of being able to mix declarations and
statements). So rather than say "before a value is assigned"
the C99 standard says "while it is indeterminate".
The one other place where the word "indeterminate" is used
without being followed by "value" is in 6.2.4 paragraph 2:
The value of a pointer becomes indeterminate when the object
it points to reaches the end of its lifetime.
(The analogous sentence in C90 says basically the same but using
different phrasing, partly because C90 doesn't have any explicit
definition of "lifetime", which of course C99 does.)
There is a corresponding entry for this passage in Annex J part 2
(and which actually doesn't use the word indeterminate):
The value of a pointer to an object whose lifetime has ended
is used (6.2.4).
There is a subtle but important difference between this rule and
the other passages mentioned above. In all of the other cases
there is a specific object being referenced. In the rule here,
we aren't talking about a particular object, nor even just one
object necessarily (there could be many), but possibly about
values that aren't in an object at all. Consider this code
fragment:
char *p = malloc( 1 );
char *q = p + (free(p),0);
It seems clear that the second line is meant to be undefined
behavior /even if the (leftmost) access of p has already taken
place before the call to free() is done/. It isn't an access to
an object (whether indeterminate or not) that is causing the
problem. Rather, it is the use of a value -- valid at the time
the value was obtained -- that has been rendered /invalid/
between the time the value was loaded from p and the time the
value is used in a '+' operation.
Of course, we all understand what's really going on here. In
real computer hardware, the bits of a pointer value don't
magically change when a free() is done (or when an object goes
out of scope and its lifetime ends, etc). Instead, the bits stay
the same, but whether the bits are meaningful or not (or whether
they have the same meaning as before) depends on the state of the
"memory system" as a whole. The term "memory system" is in
quotes because it is meant to include not just state in the
actual hardware but also assumptions made by the compiled code;
a pointer to memory in a departed stack frame may be perfectly
fine as far as the hardware is concerned, but it violates an
assumption made by the compiler that the associated memory may
be (or already have been) reused for another purpose.
One problem with this understanding is that it isn't amenable to
being expressed in the language of the abstract machine. So C99
glosses over the problem by saying "the value of a pointer
becomes indeterminate when ...", disregards what the definition
of "indeterminate value" says, and then pretends (in Annex J.2)
that using any such value is undefined behavior. The text in the
standard is very clear: reading a trap representation is always
undefined behavior (unless accessed using a character type).
There is nothing in the normative text of the standard that says
accessing an indeterminate value is undefined behavior. In fact,
if we take the text of the standard at its word, /every/ object
has an indeterminate value, because every object representation
is either a valid value or a trap representation.
If we ignore pointer types we have an answer to our question:
any type that has no trap representations never causes undefined
behavior by being accessed. Then why does the entry in Annex J.2
give a blanket statement that any use is undefined behavior? A
reasonable guess is that entries in Annex J are meant to provide
useful shorthands without necessarily being completely accurate
(consider for example that the exception for access done using a
character type is not mentioned in the Annex J.2 entry -- a clear
omission).
There is more to say about pointer types. Considering how long
this memo is already it seems better to defer that to a separate
posting.
Next we look at C11.
With respect to the question being considered, the C11 standard
is almost exactly the same as the C99 standard. There are two
differences. First, there is a cosmetic change in that the term
"trap representation" is given a summary definition in section
3.19.4; the paragraph in 6.2.6 where "trap representation" was
previously defined in C99 is unchanged except that in C11 there
are no italics.
The second difference is not a revision but an addition. In
section 6.3.2.1 paragraph 2, talking about lvalue conversion, one
sentence has been added at the end of the paragraph:
If the lvalue designates an object of automatic storage
duration that could have been declared with the register
storage class (never had its address taken), and that object
is uninitialized (not declared with an initializer and no
assignment to it has been performed prior to use), the
behavior is undefined.
Naturally there is a corresponding entry that has been added to
Annex J.2:
An lvalue designating an object of automatic storage
duration that could have been declared with the register
storage class is used in a context that requires the value
of the designated object, but the object is uninitialized.
(6.3.2.1).
The motivation for this new rule reportedly reflects hardware
behavior, on some more recent chips, for some stack-allocated
variables. The added text has several points worth noting.
One, the rule adds a specific, narrow case of undefined behavior
that is simple and clearly delineated.
Two, it does not use the term "indeterminate" or "indeterminate
value". Instead the rule is written in terms of initialization
and assignment. By avoiding "indeterminate", it avoids any
uncertainty about whether undefined behavior must result from
using an indeterminate value.
Three, it provides indirect evidence that use of an indeterminate
value is not necessarily undefined behavior, because if it were
then this new rule would not be necessary.
Four, the condition of undefined behavior is expressed using
imperative phrasing: what matters is what has been done, or not
done, to the object in question. This choice makes this rule a
supplement, not a replacement, for 6.8 p3 et al. Consider this
example function definition:
double
example( double in ){
unsigned yet = 0;
redux: ;
double d;
if( !yet ){
d = in;
yet++;
goto redux;
}
return d;
}
The use of 'd' in 'return d;' might give undefined behavior,
because 'd' may have a trap representation under 6.8 p3. But
the code doesn't violate the conditions of 6.3.2.1 p2, because
an assignment has been done before the lvalue conversion in the
final statement; the intervening evaluation of 'double d;'
doesn't change that. Note also that the clause in 6.8 p3 for
such declarations, "storing an indeterminate value in objects
without an initializer", does not interfere with the application
of the rule in 6.3.2.1 p2, because that rule is written in terms
of assignment, and not in terms of storing a value (which may
have been done because of the parenthetical phrase in 6.8 p3).
After C11
I have not taken the time to review at the C17 standard or the
C23 draft standard while researching the topic here. I see that
some changes have been made (such as "non-value representation"
for "trap representation"), but to the best of my knowledge none
of the key passages are substantively different. I may check on
that later (but no promises on when or whether).
Summary: my reading is that accessing an object that has not
been explicitly stored into since its declaration was evaluated
is necessarily undefined behavior in C90, but not necessarily
undefined behavior in C99 and C11 (and AFAIAA also in C17 and
the upcoming C23). My reasoning is given in detail above.
Postscript: this commentary has taken much longer to write than
I thought it would, for the most part because I made an early
decision to be systematic and thorough. I hope the effort has
helped the readers gain confidence in the explanations and
conclusions stated. I may return to the deferred topic about
pointer types but have no plans at present about when that might
be.