Discussion:
Rationale for special case rule for &*ptr?
(too old to reply)
Keith Thompson
2016-10-24 19:46:39 UTC
Permalink
Raw Message
N1570 6.5.3.2p3 says:

The unary & operator yields the address of its operand. If
the operand has type "type", the result has type "pointer to
type". If the operand is the result of a unary * operator,
neither that operator nor the & operator is evaluated and
the result is as if both were omitted, except that the
constraints on the operators still apply and the result is
not an lvalue. Similarly, if the operand is the result of a
[] operator, neither the & operator nor the unary * that is
implied by the [] is evaluated and the result is as if the &
operator were removed and the [] operator were changed to a +
operator. Otherwise, the result is a pointer to the object or
function designated by its operand.

This special case rule was added in C99; it does not appear in C90. The
C99 Rationale doesn't mention it, and I don't see anything about it in
the C90 DRs.

A concrete example: In C90 this:

int *ptr = 0;
&*ptr;

has undefined behavior because it dereferences a null pointer. In C99
and C11 it's well defined, and the expression &*ptr yields a null
pointer of type int*.

Why was this change made?
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Kaz Kylheku
2016-10-24 20:27:12 UTC
Permalink
Raw Message
Post by Keith Thompson
The unary & operator yields the address of its operand. If
the operand has type "type", the result has type "pointer to
type". If the operand is the result of a unary * operator,
neither that operator nor the & operator is evaluated and
the result is as if both were omitted, except that the
constraints on the operators still apply and the result is
not an lvalue. Similarly, if the operand is the result of a
[] operator, neither the & operator nor the unary * that is
implied by the [] is evaluated and the result is as if the &
operator were removed and the [] operator were changed to a +
operator. Otherwise, the result is a pointer to the object or
function designated by its operand.
This special case rule was added in C99; it does not appear in C90. The
C99 Rationale doesn't mention it, and I don't see anything about it in
the C90 DRs.
int *ptr = 0;
&*ptr;
has undefined behavior because it dereferences a null pointer. In C99
and C11 it's well defined, and the expression &*ptr yields a null
pointer of type int*.
Why was this change made?
Why indeed. In particular, it does not seem to help legitimize:

#define offsetof(TYPE, MEMB) ... &(*(TYPE *) 0).MEMB ...

(occurring in very common implementation of offsetof) since there
is a .memb selection wedged between the * dereference and the & address-of.

The second rule doesn't help with null pointers, so why the "similarly"?
It gives us:

&ptr[42] -> ptr + 42

But if ptr is null, ptr + 42 is undefined. It could be defined for
&ptr[0], if we also bring in the rule (present in C++, IIRC?) that
ptr + 0 is well-defined if ptr is null.

Without this, the second rule doesn't seem to legitimize anything;
it just seems to make mandatory an obvious optimization.
s***@casperkitty.com
2016-10-24 20:43:37 UTC
Permalink
Raw Message
Post by Keith Thompson
int *ptr = 0;
&*ptr;
has undefined behavior because it dereferences a null pointer. In C99
and C11 it's well defined, and the expression &*ptr yields a null
pointer of type int*.
Why was this change made?
I don't know why the authors of the Standard made the change, but one useful
thing the change does, at least if it is applicable in the case of a void*,
is make it possible for a macro like:

#define AsBytePtr(x) ((unsigned char*)&*(x))

to yield a compile-time squawk, rather than bogus code, in the event that
the programmer forgets an address-of operator before an integer variable.
IMHO, it would be helpful to extend the concept to work with double-
indirect pointers on implementations that use the same representation for
void* as for other pointer types, and which allow accesses via void** to
alias other pointer types.
Tim Rentsch
2016-10-24 21:20:38 UTC
Permalink
Raw Message
The unary & operator yields the address of its operand. If
the operand has type "type", the result has type "pointer to
type". If the operand is the result of a unary * operator,
neither that operator nor the & operator is evaluated and
the result is as if both were omitted, except that the
constraints on the operators still apply and the result is
not an lvalue. Similarly, if the operand is the result of a
[] operator, neither the & operator nor the unary * that is
implied by the [] is evaluated and the result is as if the &
operator were removed and the [] operator were changed to a +
operator. Otherwise, the result is a pointer to the object or
function designated by its operand.
This special case rule was added in C99; it does not appear in C90. The
C99 Rationale doesn't mention it, and I don't see anything about it in
the C90 DRs.
int *ptr = 0;
&*ptr;
has undefined behavior because it dereferences a null pointer. In C99
and C11 it's well defined, and the expression &*ptr yields a null
pointer of type int*.
Why was this change made?
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n721.htm
Keith Thompson
2016-10-24 22:01:06 UTC
Permalink
Raw Message
Post by Tim Rentsch
The unary & operator yields the address of its operand. If
the operand has type "type", the result has type "pointer to
type". If the operand is the result of a unary * operator,
neither that operator nor the & operator is evaluated and
the result is as if both were omitted, except that the
constraints on the operators still apply and the result is
not an lvalue. Similarly, if the operand is the result of a
[] operator, neither the & operator nor the unary * that is
implied by the [] is evaluated and the result is as if the &
operator were removed and the [] operator were changed to a +
operator. Otherwise, the result is a pointer to the object or
function designated by its operand.
This special case rule was added in C99; it does not appear in C90. The
C99 Rationale doesn't mention it, and I don't see anything about it in
the C90 DRs.
int *ptr = 0;
&*ptr;
has undefined behavior because it dereferences a null pointer. In C99
and C11 it's well defined, and the expression &*ptr yields a null
pointer of type int*.
Why was this change made?
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n721.htm
Yes, that's it. (I didn't do a very thorough search of the C90 DRs.)

The issue was raised by C90 DR #076:
http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_076.html

To summarize, the issue being addressed is:

int a [10];
int *p;
/* ... */
p = &a[10];

In C90, that would have undefined behavior, since a[10] is not an
object.

This enables constructs like:

for (p = &a[0]; p < &a[10]; p++)

The DR also mentions this case:

int *n = NULL;
int *p
/* ... */
p = &*n;

I'm not sure there's much value in causing that to have undefined
behavior, but it came along for the ride.
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
James R. Kuyper
2016-10-24 22:29:42 UTC
Permalink
Raw Message
On 10/24/2016 06:01 PM, Keith Thompson wrote:
...
Post by Keith Thompson
http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_076.html
int a [10];
int *p;
/* ... */
p = &a[10];
In C90, that would have undefined behavior, since a[10] is not an
object.
for (p = &a[0]; p < &a[10]; p++)
int *n = NULL;
int *p
/* ... */
p = &*n;
I'm not sure there's much value in causing that to have undefined
behavior, but it came along for the ride.
Did you mean "defined" rather than "undefined"?
Keith Thompson
2016-10-24 23:16:46 UTC
Permalink
Raw Message
[...]
Post by James R. Kuyper
Post by Keith Thompson
int *n = NULL;
int *p
/* ... */
p = &*n;
I'm not sure there's much value in causing that to have undefined
behavior, but it came along for the ride.
Did you mean "defined" rather than "undefined"?
Yes, I did.
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Florian Weimer
2016-10-25 18:23:04 UTC
Permalink
Raw Message
Post by Keith Thompson
http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_076.html
int a [10];
int *p;
/* ... */
p = &a[10];
In C90, that would have undefined behavior, since a[10] is not an
object.
Something similar is needed for C11 to make atomics more useful. We
really want that &p->mutex or &p->atomic_counter do not involve an
access to the whole of *p, so that you can have mutexes or atomic
types as struct members and access them without external
synchronization (but possiblity coordinating access to the struct
istelf).
Wojtek Lerch
2016-10-26 03:26:08 UTC
Permalink
Raw Message
Post by Florian Weimer
Something similar is needed for C11 to make atomics more useful. We
really want that &p->mutex or &p->atomic_counter do not involve an
access to the whole of *p, so that you can have mutexes or atomic
types as struct members and access them without external
synchronization (but possiblity coordinating access to the struct
istelf).
I don't think &p->mutex or &p->atomic_counter counts as an "access" to
the whole of *p or even just to the named member, do they? My
understanding of "access" only includes reading the object's value (by
an "lvalue conversion" as defined in 6.3.2.1#2) and assignments.
Tim Rentsch
2016-10-26 14:41:54 UTC
Permalink
Raw Message
Post by Florian Weimer
Post by Keith Thompson
http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_076.html
int a [10];
int *p;
/* ... */
p = &a[10];
In C90, that would have undefined behavior, since a[10] is not an
object.
Something similar is needed for C11 to make atomics more useful. We
really want that &p->mutex or &p->atomic_counter do not involve an
access to the whole of *p, so that you can have mutexes or atomic
types as struct members and access them without external
synchronization (but possiblity coordinating access to the struct
istelf).
I second Wojtek Lerch's comment: an expression like p->mutex
accesses only the one member, not the whole of *p. Assigning to
a member may change some of the padding bits and bytes, but that
still doesn't mean all of *p is accessed. AFAIAA nothing changes
if the member being accessed is atomic qualified. And in an
expression like &p->mutex, the structure isn't accessed at
all - there is only an address computation for where the 'mutex'
member is, not any access to any parts of *p.

If the structure as a whole is atomic qualified, then accessing
any individual member is undefined behavior. But I assume that
is not what you are talking about.
s***@casperkitty.com
2016-10-26 15:25:31 UTC
Permalink
Raw Message
Post by Tim Rentsch
I second Wojtek Lerch's comment: an expression like p->mutex
accesses only the one member, not the whole of *p.
If p->foo is only an access to the member and not to p, what would justify
a compiler assuming that given:

struct s1 { int x; } *p1;
struct s2 { int x; } *p2;
union u { struct s1 v1; struct s2 v2; } uv1;

it would not be possible for p1->x to alias p2->x ?

Personally, I don't think the Standard ever intended to allow compilers to
make such an inference, but gcc certainly does; the only way I can see that
such an inference would be justifiable would be if operations upon the
expression p1->x, or operations which dereference its address, are somehow
considered to be accesses to an entire object of type s1. There are
proposals to change the Standard so as to allow such inference, but
nothing else that would justify such inference.

Of course, if compilers like gcc were to have an explicit option to enable
such inferences, there would be no need to justify such behavior under the
Standard, since non-conforming modes are allowed to behave in any way that
implementation authors see fit. IMHO, non-conforming modes are vastly
underutilized since there are many cases where the Standard requires defined
behavior that most programs won't need. If code needs a struct member to
store a small signed integer in two bytes, but doesn't care what happens if
a value outside -32768..32767 is stored there, is performance enhanced by
requiring that a compiler given:

theStruct->member++;
int32_t x = theStruct->member;

must either that even if the member was 32767, it will either set x to
a value -327678..32767 or raise an implementation-defined signal? Waiving
that requirement would improve performance on many platforms, and I don't
know of many cases where specifying precise behavioral guarantees for types
smaller than "int" without applying some level of guarantee to "int" types
[for many purposes, btw, I think the most sensible level of guarantee for
the above code would be to say that if member is 32767, behavior would be
defined but x would hold a value whose lower bits were 0x8000 and whose
upper bits were Unspecified, but the Standard wouldn't allow that].
Kaz Kylheku
2016-10-26 16:11:25 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
Post by Tim Rentsch
I second Wojtek Lerch's comment: an expression like p->mutex
accesses only the one member, not the whole of *p.
If p->foo is only an access to the member and not to p, what would justify
Just because an access is not to the "whole of *p" doesn't mean
it isn't an access to *p.
Post by s***@casperkitty.com
struct s1 { int x; } *p1;
struct s2 { int x; } *p2;
union u { struct s1 v1; struct s2 v2; } uv1;
it would not be possible for p1->x to alias p2->x ?
Personally, I don't think the Standard ever intended to allow compilers to
make such an inference
ISO C goes out of its way to specifically make it clear that the above
aliasing is valid. The member x is part of the "common initial sequence"
shared by union members v1 and v2.

Assignment to uv1.v1.x may be followed by access to uv1.v2.x and
vice versa.
Post by s***@casperkitty.com
but gcc certainly does;
I doubt it; do you have a failing test case + gcc version/host/build/target?
s***@casperkitty.com
2016-10-26 17:17:45 UTC
Permalink
Raw Message
Post by Kaz Kylheku
Post by s***@casperkitty.com
struct s1 { int x; } *p1;
struct s2 { int x; } *p2;
union u { struct s1 v1; struct s2 v2; } uv1;
it would not be possible for p1->x to alias p2->x ?
Personally, I don't think the Standard ever intended to allow compilers to
make such an inference
ISO C goes out of its way to specifically make it clear that the above
aliasing is valid. The member x is part of the "common initial sequence"
shared by union members v1 and v2.
Assignment to uv1.v1.x may be followed by access to uv1.v2.x and
vice versa.
#include <stdio.h>

struct s1 {int x;} *p1;
struct s2 {int x;} *p2;
union u { struct s1 v1; struct s2 v2; } uv;

static int test(void)
{
p1->x = 1;
p2->x = 0;
return p1->x;
}

int main(void)
{
uv.v1.x = 5;
p1 = &uv.v1;
p2 = &uv.v2;
return test();
}

Paste into godbolt.org using gcc 6.2 and options -xc -O3. Generated code

main:
mov QWORD PTR p1[rip], OFFSET FLAT:uv
mov QWORD PTR p2[rip], OFFSET FLAT:uv
mov eax, 1
mov DWORD PTR uv[rip], 0
ret

Translation:
Store the address of uv into p1
Store the address of uv into p2
Load the return-value register with 1
Store 0 into the word at uv's address
Return the indicated value (i.e. 1)

I don't know of on-line tools to let a user run code with gcc and a
specified set of options, but the machine code is really straightforward.
Kaz Kylheku
2016-10-26 19:31:49 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
Post by Kaz Kylheku
Post by s***@casperkitty.com
struct s1 { int x; } *p1;
struct s2 { int x; } *p2;
union u { struct s1 v1; struct s2 v2; } uv1;
it would not be possible for p1->x to alias p2->x ?
Personally, I don't think the Standard ever intended to allow compilers to
make such an inference
ISO C goes out of its way to specifically make it clear that the above
aliasing is valid. The member x is part of the "common initial sequence"
shared by union members v1 and v2.
Assignment to uv1.v1.x may be followed by access to uv1.v2.x and
vice versa.
#include <stdio.h>
struct s1 {int x;} *p1;
struct s2 {int x;} *p2;
union u { struct s1 v1; struct s2 v2; } uv;
static int test(void)
{
p1->x = 1;
p2->x = 0;
return p1->x;
}
int main(void)
{
uv.v1.x = 5;
p1 = &uv.v1;
p2 = &uv.v2;
return test();
}
Paste into godbolt.org using gcc 6.2 and options -xc -O3. Generated code
mov QWORD PTR p1[rip], OFFSET FLAT:uv
mov QWORD PTR p2[rip], OFFSET FLAT:uv
mov eax, 1
mov DWORD PTR uv[rip], 0
ret
Thanks; Looks like a plain bug to me.

(Whether that stems from misunderstanding of ISO C, or is just a plain
coding mistake in GCC is impossible to tell without investigating it
deeper.)

The compiler must not assume that p1->x doesn't affect p2->x, because
the types of *p1 and *p2 (being struct s1 and struct s2) are involve
in a union. That union type is in scope of this code. And moreover,
the member x is part of an initial sequence of struct member having
the same types, names and order, in both struct s1 and struct s2.

On the highly improbable hypothesis that this might be a deliberate
nonconformance, you should try with -std=<whatever> or -ansi.
s***@casperkitty.com
2016-10-26 20:27:22 UTC
Permalink
Raw Message
Post by Kaz Kylheku
Post by s***@casperkitty.com
#include <stdio.h>
struct s1 {int x;} *p1;
struct s2 {int x;} *p2;
union u { struct s1 v1; struct s2 v2; } uv;
static int test(void)
{
p1->x = 1;
p2->x = 0;
return p1->x;
}
int main(void)
{
uv.v1.x = 5;
p1 = &uv.v1;
p2 = &uv.v2;
return test();
}
Paste into godbolt.org using gcc 6.2 and options -xc -O3. Generated code
mov QWORD PTR p1[rip], OFFSET FLAT:uv
mov QWORD PTR p2[rip], OFFSET FLAT:uv
mov eax, 1
mov DWORD PTR uv[rip], 0
ret
Thanks; Looks like a plain bug to me.
(Whether that stems from misunderstanding of ISO C, or is just a plain
coding mistake in GCC is impossible to tell without investigating it
deeper.)
On a mailing list discussion (sorry, no link handy), the notion that
C99's rule about the "complete union definition" being visible means
that compilers must honor the CIS rule for accesses made through pointers
to the structure types within the union (rather than requiring the
accesses to be made through pointers of the union type) was regarded
rather derisively, so I think the above treatment is deliberate. While
there are some situations where it might be useful to be able to declare
a union type without implying anything about the aliasing of its members,
structures that are never going to be used to alias each other typically
won't appear together in any unions.
Post by Kaz Kylheku
The compiler must not assume that p1->x doesn't affect p2->x, because
the types of *p1 and *p2 (being struct s1 and struct s2) are involve
in a union. That union type is in scope of this code. And moreover,
the member x is part of an initial sequence of struct member having
the same types, names and order, in both struct s1 and struct s2.
That's how I would interpret the rules, but for some reason that viewpoint
doesn't seem to be taken seriously.
Post by Kaz Kylheku
On the highly improbable hypothesis that this might be a deliberate
nonconformance, you should try with -std=<whatever> or -ansi.
The only flag I've found that works is -fno-strict-aliasing. I see no
reason a quality compiler should require that all aliasing analysis be
disabled even in cases where it would be harmless (*) in order to
prevent it from applying gcc's interpretation of the CIS rule, but
that's what gcc seems to require.

(*) (e.g. if an lvalue of type "double" is accessed twice without any
intervening accesses via double*, casts from "double*" to any other
type, calls to memcpy or memmove, accesses made via char*, or
application of the "address-of" operator on that lvalue, etc. it
should be safe to assume that no other accesses to the lvalue will
occur between those two operations).
Kaz Kylheku
2016-10-26 21:02:53 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
Post by Kaz Kylheku
Post by s***@casperkitty.com
#include <stdio.h>
struct s1 {int x;} *p1;
struct s2 {int x;} *p2;
union u { struct s1 v1; struct s2 v2; } uv;
static int test(void)
{
p1->x = 1;
p2->x = 0;
return p1->x;
}
int main(void)
{
uv.v1.x = 5;
p1 = &uv.v1;
p2 = &uv.v2;
return test();
}
Paste into godbolt.org using gcc 6.2 and options -xc -O3. Generated code
mov QWORD PTR p1[rip], OFFSET FLAT:uv
mov QWORD PTR p2[rip], OFFSET FLAT:uv
mov eax, 1
mov DWORD PTR uv[rip], 0
ret
Thanks; Looks like a plain bug to me.
(Whether that stems from misunderstanding of ISO C, or is just a plain
coding mistake in GCC is impossible to tell without investigating it
deeper.)
On a mailing list discussion (sorry, no link handy), the notion that
C99's rule about the "complete union definition" being visible means
that compilers must honor the CIS rule for accesses made through pointers
to the structure types within the union (rather than requiring the
accesses to be made through pointers of the union type) was regarded
rather derisively, so I think the above treatment is deliberate.
In my opinion "complete union definition being visible" is clear
and has no interpretation other than that the union type occurs
in the scope of the expression in question: in that scope, there
exists a complete declaration of that union type with those two
struct types.

The idea that a type "being visible" actually means that some
accesses must be made through that type is ridiculous;
it doesn't follow in any rational way from what "visible" means.
If the intent is that an access must be made in such a way,
then there is no reason to state the requirement in terms of
scope visibility; you just say something like:

"If such a common member is stored and a different one subsequently
accessed, both members shall be accessed using a member
selection expression applied to the containing union to designate
their parent structures, to which a member selection is applied to
denote the objects."

Or:

"If such a common member is stored and a different one subsequently
accessed, both members must be designated by expressions directly
derived from the containing union object, defined in the same scope.
Directly derived means that the values which determine the accessing
expressions emanate from expressions accessing the union object,
and the subsequent dependent data flows remain in the same scope (not
passing through any intermediate storage not comprised of local
variables in automatic storage, or function parameter
passing/returning) before reaching the accessing expressions."

Sorry, "visible" just doesn't say anything like this. If you want
a rule which says that it must be obvious to the compiler that
the members being accessed are from the same union, without doing
any whole-program analysis or having to do anything that is equivalent
to deciding halting, you have to cook up the text properly.

In your sample code, the objects being accessed are in fact
members of the union. The pointers to them were obtained through
an expression involving the union object.

That union type is clearly visible at that point, as required.

Case closed.

It's obvious to methat the text is designed to bless code like this,
while not impart the same blessing of a definition to code which places
the function into another translation unit (or otherwise into another
scope) where nothing is known about the union.

Real C programs do exacly the sort of thing that is represented
in your failure test case. Playing games in this area of
translation creates a monstrous risk of breaking vast lines of
code out there.
s***@casperkitty.com
2016-10-26 22:03:19 UTC
Permalink
Raw Message
Post by Kaz Kylheku
It's obvious to methat the text is designed to bless code like this,
while not impart the same blessing of a definition to code which places
the function into another translation unit (or otherwise into another
scope) where nothing is known about the union.
You and I both agree that is the obvious intention of the rule. On the
other hand, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=14319 [found
via quick google search] indicates that the authors of gcc are not
interested in behaving in a fashion consistent with that.
Keith Thompson
2016-10-26 22:30:24 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
Post by Kaz Kylheku
It's obvious to methat the text is designed to bless code like this,
while not impart the same blessing of a definition to code which places
the function into another translation unit (or otherwise into another
scope) where nothing is known about the union.
You and I both agree that is the obvious intention of the rule. On the
other hand, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=14319 [found
via quick google search] indicates that the authors of gcc are not
interested in behaving in a fashion consistent with that.
That's not how I read it. The bug was suspended "until we can figure
out what the language is supposed to mean" (referring to the common
initial sequence rule); that was in 2004. The most recent comment,
posted in 2009, suggests that visibility of the union is sufficient.
It's disappointing that no action has been taken since then, but I don't
see a statement that gcc's current behavior is correct. If that were
the intent, I presume the bug would be closed, not suspended.
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Kaz Kylheku
2016-10-26 23:18:43 UTC
Permalink
Raw Message
Post by Keith Thompson
Post by s***@casperkitty.com
Post by Kaz Kylheku
It's obvious to methat the text is designed to bless code like this,
while not impart the same blessing of a definition to code which places
the function into another translation unit (or otherwise into another
scope) where nothing is known about the union.
You and I both agree that is the obvious intention of the rule. On the
other hand, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=14319 [found
via quick google search] indicates that the authors of gcc are not
interested in behaving in a fashion consistent with that.
That's not how I read it. The bug was suspended "until we can figure
out what the language is supposed to mean" (referring to the common
initial sequence rule); that was in 2004. The most recent comment,
posted in 2009, suggests that visibility of the union is sufficient.
It's disappointing that no action has been taken since then, but I don't
see a statement that gcc's current behavior is correct. If that were
the intent, I presume the bug would be closed, not suspended.
If a language construct has two possible interpretations A and B
such that under interpretation A, some set Sa of programs
may undergo a change of behavior under optimization,
and under interpretation B, a greater set, Sb >= Sa (superset)
undergoes a change of behavior under optimization,
and otherwise the interpretations are not in any conflict,
it goes without saying that until the controversy is settled, the
compiler should be fixed to follow the conservative interpretation A:
not change the behavior of programs in Sb\Sa (set difference).

Following the dangerous interpretation B *is* the bug.

The behavior can be changed to A, and the bug can be closed.

A new bug, or task item, can be opened, like this:

TASK: (re-)introduce certain when optimizations A-B controversy
is clearly settled in favor of B.

Now ths new item can languish in the database for 20 years for all
anyone cars. Going with B should require strong community sign-off.

This kind of thing really erodes my confidence in GCC.
IT's bad engineering.

Imagine if some electronic designer says, "Gee, it's not clear from
the datasheet what current this part can handle: is it 1A or 2A?
Oh, I'm going to design for 2A and keep cranking out mass production
units that way until the data sheet is cleared up."
s***@casperkitty.com
2016-10-27 14:18:41 UTC
Permalink
Raw Message
Post by Kaz Kylheku
Post by Keith Thompson
That's not how I read it. The bug was suspended "until we can figure
out what the language is supposed to mean" (referring to the common
initial sequence rule); that was in 2004. The most recent comment,
posted in 2009, suggests that visibility of the union is sufficient.
It's disappointing that no action has been taken since then, but I don't
see a statement that gcc's current behavior is correct. If that were
the intent, I presume the bug would be closed, not suspended.
If a language construct has two possible interpretations A and B
such that under interpretation A, some set Sa of programs
may undergo a change of behavior under optimization,
and under interpretation B, a greater set, Sb >= Sa (superset)
undergoes a change of behavior under optimization,
and otherwise the interpretations are not in any conflict,
it goes without saying that until the controversy is settled, the
not change the behavior of programs in Sb\Sa (set difference).
Following the dangerous interpretation B *is* the bug.
It *should* go without saying. For quality compilers, it probably would
go without saying if nobody said it. Given that nothing in the Standard
can reasonably construed as discouraging compilers from using command-
line options or #pragama directives to support features which would
otherwise be non-conforming, I am at a loss to figure out why a compiler
which by default honors a conservative interpretation of what is required
but has options to waive that would not be considered superior to one that
defines a smaller subset of programs unless the programmer foregoes a
large class of optimizations, most of which would not be problematic, or
uses non-standard directives to force it to recognize aliasing between
the types in question.

Given that gcc has non-standard directives which can force it to recognize
aliasing between specific types, having complete visible union declarations
achieve the same effect in the absence of directives to waive that behavior
should no be even remotely difficult. While I generally try to avoid reading
too much into people's motives, I am unable to figure out any reason for
gcc's behavior which would be consistent with authors who are more interested
in producing a useful compiler than in denigrating existing C code.
Post by Kaz Kylheku
The behavior can be changed to A, and the bug can be closed.
TASK: (re-)introduce certain when optimizations A-B controversy
is clearly settled in favor of B.
Now ths new item can languish in the database for 20 years for all
anyone cars. Going with B should require strong community sign-off.
Alternatively, add a new command-line option or pragma to re-enable the
optimization. There's nothing wrong in the slightest with having a
compiler support non-conforming modes, and there are many cases where a
non-conforming modes could offer optimization opportunities for many
programs which would be far more useful than anything that can be achieved
by pushing ambiguous parts of the Standard.

As a simple example, I would suggest that it would be useful to have a
mode which, *if explicitly enabled*, would waive the special "character
type" aliasing rules in cases where code accessed an lvalue twice using
the same non-character type without any *intervening* action that would
indicate aliasing is likely. Such an option would be non-conforming,
but if the compiler is even remotely cautious in its judgment of what
actions might suggest aliasing, it would probably break a lot less code
than gcc's current approach while offering many more useful optimizations.
Post by Kaz Kylheku
Imagine if some electronic designer says, "Gee, it's not clear from
the datasheet what current this part can handle: is it 1A or 2A?
Oh, I'm going to design for 2A and keep cranking out mass production
units that way until the data sheet is cleared up."
Actually, the scenario more applicable to this situation would be a
semiconductor manufacturer producing parts with registers which won't
work reliably unless their inputs remain valid for awhile after their
outputs have changed. Data sheet values for registers often don't
guarantee that the outputs won't change until after the inputs have
been reliably captured, so nothing would prohibit a manufacturer from
offering devices whose outputs changed faster than their inputs could
capture things, but requiring circuit designers to allow for that would
almost double the amount of circuitry required for many applications.
Kaz Kylheku
2016-10-27 15:09:29 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
Post by Kaz Kylheku
Post by Keith Thompson
That's not how I read it. The bug was suspended "until we can figure
out what the language is supposed to mean" (referring to the common
initial sequence rule); that was in 2004. The most recent comment,
posted in 2009, suggests that visibility of the union is sufficient.
It's disappointing that no action has been taken since then, but I don't
see a statement that gcc's current behavior is correct. If that were
the intent, I presume the bug would be closed, not suspended.
If a language construct has two possible interpretations A and B
such that under interpretation A, some set Sa of programs
may undergo a change of behavior under optimization,
and under interpretation B, a greater set, Sb >= Sa (superset)
undergoes a change of behavior under optimization,
and otherwise the interpretations are not in any conflict,
it goes without saying that until the controversy is settled, the
not change the behavior of programs in Sb\Sa (set difference).
Following the dangerous interpretation B *is* the bug.
It *should* go without saying. For quality compilers, it probably would
go without saying if nobody said it.
It goes without saying before people get their engineering degree.
Post by s***@casperkitty.com
Given that nothing in the Standard
can reasonably construed as discouraging compilers from using command-
line options or #pragama directives to support features which would
otherwise be non-conforming, I am at a loss to figure out why a compiler
which by default honors a conservative interpretation of what is required
but has options to waive that would not be considered superior to one that
defines a smaller subset of programs unless the programmer foregoes a
large class of optimizations, most of which would not be problematic, or
uses non-standard directives to force it to recognize aliasing between
the types in question.
Exactly; why don't they have an "uck-me" code generation option
which can put into effect all their crazy interpretations
of the standard.
s***@casperkitty.com
2016-10-27 16:03:47 UTC
Permalink
Raw Message
Post by Kaz Kylheku
Exactly; why don't they have an "uck-me" code generation option
which can put into effect all their crazy interpretations
of the standard.
Further, given the large class of programs which need to use aliasing, but
only do so using a few easily-recognizable patterns that seldom appear when
aliasing isn't required, why is there no effort to support such programs
efficiently? A very simple rule that accesses of type T will not be moved
across explicit casts from T* to any other type, and the first access of
type T following such a cast will not be moved across any access made via
pointer that cannot be shown not to have been derivable from that cast
would impede some otherwise-useful optimizations, but not nearly as many
as -fno-strict-aliasing. If such a rule would impair 25% of useful
optimizations, does it make more sense to have a mode which achieves
75% of the available optimizations and yields code that works, or to have
the only choices be a mode which works but achieves 0% of the optimizations,
or one which achieves 110% of the available "optimizations" but doesn't work.
Tim Rentsch
2016-11-01 17:18:25 UTC
Permalink
Raw Message
Post by Kaz Kylheku
Post by Keith Thompson
Post by s***@casperkitty.com
Post by Kaz Kylheku
It's obvious to methat the text is designed to bless code like this,
while not impart the same blessing of a definition to code which places
the function into another translation unit (or otherwise into another
scope) where nothing is known about the union.
You and I both agree that is the obvious intention of the rule. On the
other hand, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=14319 [found
via quick google search] indicates that the authors of gcc are not
interested in behaving in a fashion consistent with that.
That's not how I read it. The bug was suspended "until we can figure
out what the language is supposed to mean" (referring to the common
initial sequence rule); that was in 2004. The most recent comment,
posted in 2009, suggests that visibility of the union is sufficient.
It's disappointing that no action has been taken since then, but I don't
see a statement that gcc's current behavior is correct. If that were
the intent, I presume the bug would be closed, not suspended.
If a language construct has two possible interpretations A and B
such that under interpretation A, some set Sa of programs
may undergo a change of behavior under optimization,
and under interpretation B, a greater set, Sb >= Sa (superset)
undergoes a change of behavior under optimization,
and otherwise the interpretations are not in any conflict,
it goes without saying that until the controversy is settled, the
not change the behavior of programs in Sb\Sa (set difference).
Following the dangerous interpretation B *is* the bug.
This is an excellent sentiment. I hope that you will post a
followup comment for one or both of

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=14319
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65892

making the same point. I would be happy to second the
motion.
s***@casperkitty.com
2016-10-26 23:20:28 UTC
Permalink
Raw Message
Post by Keith Thompson
Post by s***@casperkitty.com
Post by Kaz Kylheku
It's obvious to methat the text is designed to bless code like this,
while not impart the same blessing of a definition to code which places
the function into another translation unit (or otherwise into another
scope) where nothing is known about the union.
You and I both agree that is the obvious intention of the rule. On the
other hand, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=14319 [found
via quick google search] indicates that the authors of gcc are not
interested in behaving in a fashion consistent with that.
That's not how I read it. The bug was suspended "until we can figure
out what the language is supposed to mean" (referring to the common
initial sequence rule); that was in 2004. The most recent comment,
posted in 2009, suggests that visibility of the union is sufficient.
It's disappointing that no action has been taken since then, but I don't
see a statement that gcc's current behavior is correct. If that were
the intent, I presume the bug would be closed, not suspended.
The issue was reported again in 2015 at

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65892

and that was suspended and marked as a duplicate of the earlier bug. I
don't know how to search effectively for other discussions of the issue,
but in at least one was extremely derisive with regard to the idea that
the was meant to be have the meaning that kaz and I both consider obvious.

Any program which would behave in defined fashion if the rule meant what
gcc's authors claim would behave in the same defined fashion if visibility
of the complete union type required a compiler to recognize aliasing of
CIS members between the structs in question, so for gcc to interpret the
rule in the latter fashion would not break any code. There are some cases
where it might needless impede what would otherwise be useful optimizations,
but most programs don't combine arbitrary structures within unions for no
reason.
Tim Rentsch
2016-11-01 17:13:20 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
On a mailing list discussion (sorry, no link handy), the notion
that C99's rule about the "complete union definition" being
visible means that compilers must honor the CIS rule for accesses
made through pointers to the structure types within the union
(rather than requiring the accesses to be made through pointers of
the union type) was regarded rather derisively, [...]
What, and you didn't set them straight on it? What's wrong with
you? You're falling down on the job buddy. ;)
Florian Weimer
2016-10-31 18:32:38 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
If p->foo is only an access to the member and not to p, what would justify
struct s1 { int x; } *p1;
struct s2 { int x; } *p2;
union u { struct s1 v1; struct s2 v2; } uv1;
it would not be possible for p1->x to alias p2->x ?
GCC does not reat the union definition as a declaration that members
of s1 and s2 can alias. The standard is a bit unclear whether this
union only applies if its definition has been seen before in the
current translation unit, or if it is present anywhere in the program.
The second interpretation pretty much defeats any type-based aliasing
analysis in a compiler which supports separate compilation, like GCC
does.
s***@casperkitty.com
2016-10-31 19:29:16 UTC
Permalink
Raw Message
Post by Florian Weimer
Post by s***@casperkitty.com
If p->foo is only an access to the member and not to p, what would justify
struct s1 { int x; } *p1;
struct s2 { int x; } *p2;
union u { struct s1 v1; struct s2 v2; } uv1;
it would not be possible for p1->x to alias p2->x ?
GCC does not reat the union definition as a declaration that members
of s1 and s2 can alias. The standard is a bit unclear whether this
union only applies if its definition has been seen before in the
current translation unit, or if it is present anywhere in the program.
The second interpretation pretty much defeats any type-based aliasing
analysis in a compiler which supports separate compilation, like GCC
does.
C99 makes clear that the compiler is only required to recognize the
union as implying that the items might alias in cases where a complete
declaration of the union type is visible before the usage [it would be
absurd to say that a compiler must presume that members of union "foo"
could alias if it only sees an incomplete declaration like:

typedef union foo ufoo;

and nothing in the files it's looking at ever indicates what types are
included as members of that union.

The rule in C99 is clear and unambiguous, at least with respect to the
Common Initial Sequence rule. Compilers writers may not want to follow
it, but that doesn't mean the rule is unclear or ambiguous.
Tim Rentsch
2016-11-01 17:26:34 UTC
Permalink
Raw Message
Post by Florian Weimer
Post by s***@casperkitty.com
If p->foo is only an access to the member and not to p, what would justify
struct s1 { int x; } *p1;
struct s2 { int x; } *p2;
union u { struct s1 v1; struct s2 v2; } uv1;
it would not be possible for p1->x to alias p2->x ?
GCC does not reat the union definition as a declaration that members
of s1 and s2 can alias.
If so that is a bug in gcc.
Post by Florian Weimer
The standard is a bit unclear whether this
union only applies if its definition has been seen before in the
current translation unit, or if it is present anywhere in the program.
To me it seems quite clear that a completed definition must
appear prior to the use and in the same TU. Why do you think
it isn't clear?
Post by Florian Weimer
The second interpretation pretty much defeats any type-based aliasing
analysis in a compiler which supports separate compilation, like GCC
does.
I agree that it would in some cases, but that is a moot point
as it is not what the Standard prescribes.
Tim Rentsch
2016-11-01 17:07:50 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
Post by Tim Rentsch
I second Wojtek Lerch's comment: an expression like p->mutex
accesses only the one member, not the whole of *p.
If p->foo is only an access to the member and not to p, what would justify
struct s1 { int x; } *p1;
struct s2 { int x; } *p2;
union u { struct s1 v1; struct s2 v2; } uv1;
it would not be possible for p1->x to alias p2->x ?
This question sounds rhetorical. Also it is too indirect.
Presumably your question concerns some possibly undefined
behavior. Such a question is better asked something like this:

"Consider the following .c file:

[put the code here]

Question: is the behavior defined or undefined? To me it
looks like the behavior is [your assessment]. This
conclusion is based on [citations of relevant section
references, or actual text, in the ISO standard, Rationale,
or other documents on the open-std.org website]. My
reasoning is as follows: [explanation here].
"

Do that and I'll take a look at your question. I don't go
for this rhetorical sleight-of-hand crap.
Florian Weimer
2016-10-31 18:35:42 UTC
Permalink
Raw Message
Post by Tim Rentsch
Post by Florian Weimer
Post by Keith Thompson
http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_076.html
int a [10];
int *p;
/* ... */
p = &a[10];
In C90, that would have undefined behavior, since a[10] is not an
object.
Something similar is needed for C11 to make atomics more useful. We
really want that &p->mutex or &p->atomic_counter do not involve an
access to the whole of *p, so that you can have mutexes or atomic
types as struct members and access them without external
synchronization (but possiblity coordinating access to the struct
istelf).
I second Wojtek Lerch's comment: an expression like p->mutex
accesses only the one member, not the whole of *p. Assigning to
a member may change some of the padding bits and bytes, but that
still doesn't mean all of *p is accessed. AFAIAA nothing changes
if the member being accessed is atomic qualified. And in an
expression like &p->mutex, the structure isn't accessed at
all - there is only an address computation for where the 'mutex'
member is, not any access to any parts of *p.
As was pointed out below, compilers use the implied access to *p to
give the location *p a specific dynamic type and rule out that
(void *) p == (void *) q if *q has a static type that is incompatible
with the inferred dynamic type. This optimization is considered
rather important for performance.

It would be shame if compilers started to elide mutex or atomic
operations because the implied access to *p introduces a data race.
s***@casperkitty.com
2016-10-31 19:48:45 UTC
Permalink
Raw Message
Post by Florian Weimer
As was pointed out below, compilers use the implied access to *p to
give the location *p a specific dynamic type and rule out that
(void *) p == (void *) q if *q has a static type that is incompatible
with the inferred dynamic type. This optimization is considered
rather important for performance.
In most cases, an assumption that p->x will not alias q->x in cases where
p and q have different types, even if p->x and q->x have the same type, will
not pose any difficulty. On the other hand, there are situations (especially
involving the CIS rule) where not having a compiler recognize aliasing would
make things much clunkier. If p->x and q->x serve the same purposes in their
respective structures, being able to write p->x and q->x rather than some
nasty mess like:

*(typeOfx*)((char*)p + offsetof(typeOfp,x))

will make code easier to write and to read, and far less likely to get
tripped up by things like operator precedence issues, etc.
Tim Rentsch
2016-11-01 18:39:53 UTC
Permalink
Raw Message
Post by Florian Weimer
Post by Tim Rentsch
Post by Florian Weimer
Post by Keith Thompson
http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_076.html
int a [10];
int *p;
/* ... */
p = &a[10];
In C90, that would have undefined behavior, since a[10] is not an
object.
Something similar is needed for C11 to make atomics more useful. We
really want that &p->mutex or &p->atomic_counter do not involve an
access to the whole of *p, so that you can have mutexes or atomic
types as struct members and access them without external
synchronization (but possiblity coordinating access to the struct
istelf).
I second Wojtek Lerch's comment: an expression like p->mutex
accesses only the one member, not the whole of *p. Assigning to
a member may change some of the padding bits and bytes, but that
still doesn't mean all of *p is accessed. AFAIAA nothing changes
if the member being accessed is atomic qualified. And in an
expression like &p->mutex, the structure isn't accessed at
all - there is only an address computation for where the 'mutex'
member is, not any access to any parts of *p.
As was pointed out below,
To what are you referring? The gcc bug mentioned in your other
posting? That is not "below" in my newsreader.
Post by Florian Weimer
compilers use the implied access to *p to
give the location *p a specific dynamic type and rule out that
(void *) p == (void *) q if *q has a static type that is incompatible
with the inferred dynamic type. This optimization is considered
rather important for performance.
So you aren't talking about access to the whole struct but about
some sort of static optimization analysis? If so that needs to
be made clear. If not, then what are you talking about?

In any case, what matters is not what compilers do but what the
Standard prescribes. Can you describe the situation you are
interested in solely in terms of statements in the Standard?
Post by Florian Weimer
It would be shame if compilers started to elide mutex or atomic
operations because the implied access to *p introduces a data race.
I won't disagree, but that is of interest here only if what said
compilers do conforms to the requirements in the Standard. Can
you give an example translation unit that illustrates the problem
you are alluding to? Can you identify which passages in the
Standard may give rise to this problem?

Tim Rentsch
2016-10-25 22:08:04 UTC
Permalink
Raw Message
Post by Tim Rentsch
The unary & operator yields the address of its operand. If
the operand has type "type", the result has type "pointer to
type". If the operand is the result of a unary * operator,
neither that operator nor the & operator is evaluated and
the result is as if both were omitted, except that the
constraints on the operators still apply and the result is
not an lvalue. Similarly, if the operand is the result of a
[] operator, neither the & operator nor the unary * that is
implied by the [] is evaluated and the result is as if the &
operator were removed and the [] operator were changed to a +
operator. Otherwise, the result is a pointer to the object or
function designated by its operand.
This special case rule was added in C99; it does not appear in C90. The
C99 Rationale doesn't mention it, and I don't see anything about it in
the C90 DRs.
int *ptr = 0;
&*ptr;
has undefined behavior because it dereferences a null pointer. In C99
and C11 it's well defined, and the expression &*ptr yields a null
pointer of type int*.
Why was this change made?
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n721.htm
Yes, that's it. [...]
Cases like this one might be called "accidents of writing". What
I mean by that is people knew even at the time the original
Standard was being written that cases like '&a[10]' needed to be
allowed, but how the writing was done didn't account for that,
and the implications of the exact phrasing used were not realized
until later. Part of the reason this happens is that WG14 tends
to be less "language lawyerly" than, eg, many discussions here
in the newsgroups.
s***@casperkitty.com
2016-10-25 22:39:37 UTC
Permalink
Raw Message
Post by Tim Rentsch
Cases like this one might be called "accidents of writing". What
I mean by that is people knew even at the time the original
Standard was being written that cases like '&a[10]' needed to be
allowed, but how the writing was done didn't account for that,
and the implications of the exact phrasing used were not realized
until later. Part of the reason this happens is that WG14 tends
to be less "language lawyerly" than, eg, many discussions here
in the newsgroups.
A more interesting example might be the asymmetry of the rules which would
allow a pointer of a structure or union type containing a field of type X
to access an object of type X, but would not allow a structure to be
accessed using a field of a member type. I suspect the real source of such
asymmetry is that the authors knew of compilers caching scalars in registers
and wanted to require that an access using a structure type would cause any
cached members thereof to be flushed. Since they didn't anticipate compilers
caching structures in registers, it wasn't necessary to mandate that an
access to a constituent type flush any cached structure values.
Tim Rentsch
2016-11-01 16:44:55 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
Post by Tim Rentsch
Cases like this one might be called "accidents of writing". What
I mean by that is people knew even at the time the original
Standard was being written that cases like '&a[10]' needed to be
allowed, but how the writing was done didn't account for that,
and the implications of the exact phrasing used were not realized
until later. Part of the reason this happens is that WG14 tends
to be less "language lawyerly" than, eg, many discussions here
in the newsgroups.
A more interesting example might be the asymmetry of the rules
which would allow a pointer of a structure or union type
containing a field of type X to access an object of type X, but
would not allow a structure to be accessed using a field of a
member type. I suspect the real source of such asymmetry is that
the authors knew of compilers caching scalars in registers and
wanted to require that an access using a structure type would
cause any cached members thereof to be flushed. Since they didn't
anticipate compilers caching structures in registers, it wasn't
necessary to mandate that an access to a constituent type flush
any cached structure values.
I'm sure that is not the case. It's just not how they think.
Ben Bacarisse
2016-10-25 11:21:40 UTC
Permalink
Raw Message
Post by Keith Thompson
The unary & operator yields the address of its operand. If
the operand has type "type", the result has type "pointer to
type". If the operand is the result of a unary * operator,
neither that operator nor the & operator is evaluated and
the result is as if both were omitted, except that the
constraints on the operators still apply and the result is
not an lvalue. Similarly, if the operand is the result of a
[] operator, neither the & operator nor the unary * that is
implied by the [] is evaluated and the result is as if the &
operator were removed and the [] operator were changed to a +
operator. Otherwise, the result is a pointer to the object or
function designated by its operand.
This special case rule was added in C99; it does not appear in C90. The
C99 Rationale doesn't mention it, and I don't see anything about it in
the C90 DRs.
The link[1], posted by Tim, discussing DR076 and it's resolution also
include an addition to footnote 54 (which became footnote 87 in C99):

"Thus &*E is equivalent to E (even if E is a null pointer), and
&(E1[E2]) to (E1+E2)."

Equivalent is a strong word. E that &*E differ in that one is a lvalue
and the other not, but they may also differ in type. Given

char E[100];

the types of expressions E and &*E are, respectively, char (*)[100] and
char *. This follows from the wording about types quoted above, and
both gcc and clang agree when asked for sizeof E and sizeof &*E.

Are gcc and clang correct and, if so, is "equivalent" rather too strong
a word to use when trying to clarify this matter?

<snip>

[1] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n721.htm
--
Ben.
James Kuyper
2016-10-25 11:42:32 UTC
Permalink
Raw Message
On 10/25/2016 07:21 AM, Ben Bacarisse wrote:
...
Post by Ben Bacarisse
The link[1], posted by Tim, discussing DR076 and it's resolution also
"Thus &*E is equivalent to E (even if E is a null pointer), and
&(E1[E2]) to (E1+E2)."
Equivalent is a strong word. E that &*E differ in that one is a lvalue
and the other not, but they may also differ in type. Given
char E[100];
the types of expressions E and &*E are, respectively, char (*)[100] and
Did you mean char[100]?
Post by Ben Bacarisse
char *. This follows from the wording about types quoted above, and
both gcc and clang agree when asked for sizeof E and sizeof &*E.
Are gcc and clang correct and, if so, is "equivalent" rather too strong
a word to use when trying to clarify this matter?
I think you're right about that. The '*' in "&*E" induces the conversion
of E into a pointer to it's first element; a plain E does not. Since
it's not an lvalue, you can't apply & to it, so the only context where
that conversion won't happen anyway is sizeof - but that does make it
slightly less than fully equivalent. The only other exception to that
the array->pointer rule doesn't apply to E, but would apply to
&*"hello". Unlike "hello", it can't be used to initialize a char array:

char greeting[] = &*"hello"; // Not permitted
Ben Bacarisse
2016-10-25 13:19:03 UTC
Permalink
Raw Message
Post by James Kuyper
...
Post by Ben Bacarisse
The link[1], posted by Tim, discussing DR076 and it's resolution also
"Thus &*E is equivalent to E (even if E is a null pointer), and
&(E1[E2]) to (E1+E2)."
Equivalent is a strong word. E that &*E differ in that one is a lvalue
and the other not, but they may also differ in type. Given
char E[100];
the types of expressions E and &*E are, respectively, char (*)[100] and
Did you mean char[100]?
<sigh> Yes, I did. One is an array type and the other is a pointer type.
Post by James Kuyper
Post by Ben Bacarisse
char *. This follows from the wording about types quoted above, and
both gcc and clang agree when asked for sizeof E and sizeof &*E.
Are gcc and clang correct and, if so, is "equivalent" rather too strong
a word to use when trying to clarify this matter?
I think you're right about that. The '*' in "&*E" induces the conversion
of E into a pointer to it's first element; a plain E does not. Since
it's not an lvalue, you can't apply & to it, so the only context where
that conversion won't happen anyway is sizeof - but that does make it
slightly less than fully equivalent. The only other exception to that
the array->pointer rule doesn't apply to E, but would apply to
char greeting[] = &*"hello"; // Not permitted
--
Ben.
Tim Rentsch
2016-10-25 22:25:23 UTC
Permalink
Raw Message
The unary & operator yields the address of its operand. If
the operand has type "type", the result has type "pointer to
type". If the operand is the result of a unary * operator,
neither that operator nor the & operator is evaluated and
the result is as if both were omitted, except that the
constraints on the operators still apply and the result is
not an lvalue. Similarly, if the operand is the result of a
[] operator, neither the & operator nor the unary * that is
implied by the [] is evaluated and the result is as if the &
operator were removed and the [] operator were changed to a +
operator. Otherwise, the result is a pointer to the object or
function designated by its operand.
This special case rule was added in C99; it does not appear in C90. The
C99 Rationale doesn't mention it, and I don't see anything about it in
the C90 DRs.
The link[1], posted by Tim, discussing DR076 and its resolution also
"Thus &*E is equivalent to E (even if E is a null pointer), and
&(E1[E2]) to (E1+E2)."
Equivalent is a strong word. E that &*E differ in that one is a lvalue
and the other not, but they may also differ in type. Given
char E[100];
the types of expressions E and &*E are, respectively, char (*)[100] and
char *. This follows from the wording about types quoted above, and
both gcc and clang agree when asked for sizeof E and sizeof &*E.
Are gcc and clang correct and, if so, is "equivalent" rather too strong
a word to use when trying to clarify this matter?
<snip>
[1] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n721.htm
Clearly you are right that E and &*E are sometimes different in
their semantics, and therefore "equivalent" is not the right
word to use here.

On the other hand, this comment is made in the context of the two
operators (& and *) being discussed, and in that context the
operand of * must have pointer type. If the type of E is a pointer
type, I believe E and &*E are equivalent, not counting of course
the question of lvalueness, but that point is mentioned in the
normative text of the same section as the footnote.

Another way of looking at it is that this statement provides a good
example of why footnotes are only informative and not normative. :)
Loading...