Discussion:
Correct certain instances of Undefined Behavior when using small unsigned types
(too old to reply)
s***@casperkitty.com
2016-02-13 19:22:28 UTC
Permalink
Raw Message
At present, if x and y are expressions of any integer type, any expression
of the following forms whose results are either ignored altogether or else
converted to a type whose values are all representable as "unsigned int",
will in all defined cases yield the same results as if x and y had been
converted to unsigned int:

+x -x x+y x-y x*y x&y x^y x|y cond?x:y

Consequently, while the Standard dictates that arguments of unsigned types
smaller than "unsigned int" will be promoted to a signed "int", the behavior
of the aformentioned expressions would in all defined cases be the same if
the promotion were instead promoted to "unsigned int", and if such conversion
caused any subexpressions to be treated likewise. I expect the authors of
the rule which specifies that small unsigned values promote to signed in
that case intended that it shouldn't matter whether they were promoted to
signed or unsigned.

Unfortunately, even though conversion to signed "int" will yield the same
result in all defined cases as conversion to "unsigned int", and even though
it would be extremely rare for promotion to "unsigned int" to be more
expensive than promotion to "int", promotion to "int" will result in
Undefined Behavior in some cases where promotion to "unsigned" would not.

For example, on systems where "int" is 32 bits, the following function will
yield Undefined Behavior if passed a value larger than 46340, even though
its behavior would be defined for all possible values on systems where "int"
is 16 bits, or where "int" is larger than 32 bits:

uint16_t square_mod_65536(uint16_t x) { return x*x; }

I would suggest that because:

1. Systems which could evaluate the above expression more efficiently
using type "int" than "unsigned int" are extremely rare (if any
exist at all).

2. The vast majority of compilers have always generated code which was
equivalent to what would be generated if the computation were
performed using an unsigned type, and would do so even if they paid
no attention to how the result was going to be used, so in most
cases the rule would impose no new burden on compilers.

3. Outside of contrived situations, it is very unlikely that any
optimizations which would be facilitated by having the values promote
to a signed type rather than an unsigned type would enable the
generation of a program that would more efficiently meet its require-
ments than would be possible if the promotion were to an unsigned
type.

The Standard should be modified to say that if the result of any of the
above forms of expression is converted to an unsigned type whose values
are all representable as "unsigned int", the result shall be the same as
if the operands were converted to that type and the operation was performed
on that type.

Impact on existing code:

NONE, outside of examples like the above contrived to show the effect
of Undefined Behavior.

Impact on existing compilers:

NONE, except in cases where compilers would make inferences about possible
argument values; such inferences would need to be disabled in the
aforementioned cases.

Impact on code efficiency:

NONE, except in cases where the inputs received by the program would never
cause integer overflow, such knowledge would allow a compiler to generate
more efficient code, and the only basis for such inference would come from
expressions which involve short unsigned types being promoted to signed
integers. Such cases are in practice likely vastly outnumbered by those
in which such inferences would disrupt the behavior of programs whose
behavior would be fully (and correctly) defined if this rule were in
effect.

While some forms of Undefined Behavior are intended to grant useful freedom
to compilers, the forms covered by this proposal were almost certainly not
intended to do so; the most likely reason that the Standard does not mandate
that overflow behavior in these cases yield the same result as would have
occurred using unsigned math is that the authors of the Standard likely
expected that compilers would always behave that way whether or not the
Standard compelled them to do so.
Philip Lantz
2016-02-14 02:40:03 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
At present, if x and y are expressions of any integer type, any expression
of the following forms whose results are either ignored altogether or else
converted to a type whose values are all representable as "unsigned int",
will in all defined cases yield the same results as if x and y had been
+x -x x+y x-y x*y x&y x^y x|y cond?x:y
Consequently, while the Standard dictates that arguments of unsigned types
smaller than "unsigned int" will be promoted to a signed "int", the behavior
of the aformentioned expressions would in all defined cases be the same if
the promotion were instead promoted to "unsigned int", and if such conversion
caused any subexpressions to be treated likewise. I expect the authors of
the rule which specifies that small unsigned values promote to signed in
that case intended that it shouldn't matter whether they were promoted to
signed or unsigned.
You are mistaken. A colleague and I wrote an objection to the rule some
30 years ago when it was proposed, and our argument was rejected. The
committee was well aware of the cases in which it matters and those in
which it doesn't. Even though I didn't agree with their conclusion, it
is clear that there are pros and cons on both sides. (The fact that it
is still causing confusion 30 years later may be an indication that
they made the wrong choice, but without comparing to an alternate
universe where the other choice was made, it's hard to say.)

Consider:

uint16_t a;
int b;
a < b
a / b

Unless you are proposing that the integer promotions behave differently
depending on the containing expression, it is clear that the rule cannot
be changed now.

Philip
s***@casperkitty.com
2016-02-14 03:44:28 UTC
Permalink
Raw Message
Post by Philip Lantz
Unless you are proposing that the integer promotions behave differently
depending on the containing expression, it is clear that the rule cannot
be changed now.
The proposed change would not affect any defined behaviors, and I really doubt
that any production code relies upon any platform defining a corner-case
behaviors which would differ from this proposal in places where the Standard
imposes no constraints. The proposal would require the compiler to vary
behavior based upon the type of the enclosing expression in two scenarios:

1. Platforms which do not use two's-complement math may need to vary their
behavior based upon whether the result will be used as signed or unsigned,
but I would expect that on most such platforms performing the math as
unsigned will be faster than performing as signed and then adjusting
negative results for cases not resulting in UB, and thus I would expect
that most such platforms would already comply with the proposal.

2. Platforms which make inferences about program input based on a presumption
that no arithmetic overflows will occur. Such inferential logic already
evaluates the contexts in which expressions are used, and thus the proposed
rule would not make things more complicated.

Can you identify any notable historical compilers which, given:

unsigned short x = -1;
x*=x;

would not have left "x" holding the value "1", with no side-effects?

I could see considerable value to having a 16-bit unsigned type [distinct
from uint16_t] which would allow a compiler to assume that x*=x; won't be
executed with a value outside the range 0..255, but I fail to see any
value in saying that compilers only have to handle x values up to 46340.
What useful purpose is served by saying requiring a programmer to write
x*=1u*x; instead of x*=x;?
Philip Lantz
2016-02-14 07:41:05 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
Post by Philip Lantz
Unless you are proposing that the integer promotions behave differently
depending on the containing expression, it is clear that the rule cannot
be changed now.
The proposed change would not affect any defined behaviors,
Why did you snip and fail to address the examples I gave, which show that
you are mistaken?
s***@casperkitty.com
2016-02-14 10:15:14 UTC
Permalink
Raw Message
Post by Philip Lantz
Why did you snip and fail to address the examples I gave, which show that
you are mistaken?
It appears you misunderstood my proposal, which listed specific forms of
expression to which the rule would be applicable:

+x -x x+y x-y x*y x&y x^y x|y cond?x:y

The rule would not be applicable to the operands of relational operators nor
the division operator, nor would it be applicable to the operands of the
equality/inequality operators, the remainder operator, or the right-shift
operator, nor would it be applicable in cases where the result of an
operator is used in a manner that does not convert it to an unsigned type
(conditional tests yield a 0 or 1, but aren't conversions).

The operators to which the rule would be applicable [btw, I forgot
to list x<<amount and ~x] would be those where each unsigned type would in
all defined cases behave as an algebraic ring. For example, if T is any
unsigned type which is not larger than "unsigned int", and x1, x2, y1,
and y2 are any integer type, and if (T)x1==(T)x2 and (T)y1==(T)y2, then
in all defined cases:

(T)(+x1) will equal (T)(+x2)
(T)(-x1) will equal (T)(-x2)
(T)(x1+y1) will equal (T)(x2+y2)
(T)(x1-y1) will equal (T)(x2-y2)
(T)(x1*y1) will equal (T)(x2*y2)
(T)(x1&y1) will equal (T)(x2&y2)
(T)(x1|y1) will equal (T)(x2|y2)
(T)(x1^y1) will equal (T)(x2^y2)
(T)(cond?x1:y1) will equal (T)(cond?x2:y2)
(T)(x1<<amt) will equal (T)(x2<<amt)
(T)(~x1) will equal (T)(~x2)

My proposal would be to extend the equivalences which hold in all defined
cases so that they would apply for all combinations of x1, x2, y1, and y2
including those for which the present Standard imposes no requirement (but
which 99% of compilers uphold anyway).
Philip Lantz
2016-02-14 19:59:43 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
Post by Philip Lantz
Why did you snip and fail to address the examples I gave, which show that
you are mistaken?
It appears you misunderstood my proposal, which listed specific forms of
+x -x x+y x-y x*y x&y x^y x|y cond?x:y
So you /are/ proposing that the integer promotions behave differently
depending on the containing expression. (I wish you had clarified that
when I first mentioned it.) As much as I dislike the current rule, I
think having the rule vary depending on context would be even more
confusing. There are currently no situations in C where the type of
an expression or subexpression depends on its context.
s***@casperkitty.com
2016-02-14 22:33:48 UTC
Permalink
Raw Message
Post by Philip Lantz
So you /are/ proposing that the integer promotions behave differently
depending on the containing expression. (I wish you had clarified that
when I first mentioned it.) As much as I dislike the current rule, I
think having the rule vary depending on context would be even more
confusing. There are currently no situations in C where the type of
an expression or subexpression depends on its context.
An alternative way of expressing the rule would be to say that integer
overflow in the indicated expressions shall behave as though it yields a
result which may cause Undefined Behavior to be invoked it if is used in
various fashions, but may not invoke Undefined Behavior if it is converted
to an unsigned type which is no larger than "unsigned int", or if it is
used as an argument to an expression of the indicated forms (in the latter
case, the result of the expression would be a similar trap representation).

Another somewhat more narrowly-focused way of achieving the objective would
be to say that unsigned types shorter than "int" will be promoted to
signed "int" if there are any defined situations where such promotion would
behave differently from "unsigned int", and will otherwise be promoted to
"unsigned int". By definition, the latter rule would not change any defined
behaviors, but would defuse some of the time bombs that are lurking all over
the place in production code.

Personally, I consider absurd the number of situations where 99% of
compilers behave a certain way, but there's no form of normative standard.
It used to be that compiler writers took the attitude that if there is a
"natural" way that something would be expected to behave on a given platform,
and if making things behave that way 100% of the time would very rarely
require extra code, and if programmers might conceivably benefit from having
the behavior be consistent, then compilers should try to make the behavior be
useful 100% of the time. Back when compiler writers held such attitudes,
the failure of the Standard to address such things didn't really matter.

Unfortunately, a new philosophy has emerged which indicates that compiler
writers should consider themselves bound only by the Standard, and not by
historical precedent. Unless the Standard changes to acknowledge common
practices, I would consider the existence of the Standard to be more harmful
than beneficial to the language.

I would suggest that historical common practice has been to say that if
computations using short unsigned values would yield the same result
whether it was performed using signed or unsigned arithmetic except in
cases where unsigned arithmetic would yield a defined result and signed
arithmetic would yield Undefined Behavior, the result effect will be
the same as if evaluation were done using unsigned arithmetic. I know
of many compilers that worked that way, and zero that would behave
otherwise except when used in "pedantic" modes. Since compiler writers
have historically had no difficulty yielding such behaviors, is there any
reason it should be difficult for them to continuing to do so?

BTW, I think that if it is necessary that there only be one type for each
numeric storage format, the present rules would--if adjusted to avoid cases
where the only effect of using signed rather than promotion is to cause
some cases to be Undefined--would be about as good as good as could be
achieved given that constraint, *but* the fundamental problem is with the
premise that for each unsigned storage format there should only be one
type.

Some code uses unsigned values to represent numbers, while other code uses
them to represent members of abstract algebraic rings. If the language had
separate types for the two concepts, the rules could be made much more
uniform, code could be much more portable, and optimizers could have many
more opportunities for optimization than is presently the case.
Jakob Bohm
2016-02-14 18:20:25 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
At present, if x and y are expressions of any integer type, any expression
of the following forms whose results are either ignored altogether or else
converted to a type whose values are all representable as "unsigned int",
will in all defined cases yield the same results as if x and y had been
+x -x x+y x-y x*y x&y x^y x|y cond?x:y
Consequently, while the Standard dictates that arguments of unsigned types
smaller than "unsigned int" will be promoted to a signed "int", the behavior
of the aformentioned expressions would in all defined cases be the same if
the promotion were instead promoted to "unsigned int", and if such conversion
caused any subexpressions to be treated likewise. I expect the authors of
the rule which specifies that small unsigned values promote to signed in
that case intended that it shouldn't matter whether they were promoted to
signed or unsigned.
Unfortunately, even though conversion to signed "int" will yield the same
result in all defined cases as conversion to "unsigned int", and even though
it would be extremely rare for promotion to "unsigned int" to be more
expensive than promotion to "int", promotion to "int" will result in
Undefined Behavior in some cases where promotion to "unsigned" would not.
For example, on systems where "int" is 32 bits, the following function will
yield Undefined Behavior if passed a value larger than 46340, even though
its behavior would be defined for all possible values on systems where "int"
uint16_t square_mod_65536(uint16_t x) { return x*x; }
1. Systems which could evaluate the above expression more efficiently
using type "int" than "unsigned int" are extremely rare (if any
exist at all).
2. The vast majority of compilers have always generated code which was
equivalent to what would be generated if the computation were
performed using an unsigned type, and would do so even if they paid
no attention to how the result was going to be used, so in most
cases the rule would impose no new burden on compilers.
3. Outside of contrived situations, it is very unlikely that any
optimizations which would be facilitated by having the values promote
to a signed type rather than an unsigned type would enable the
generation of a program that would more efficiently meet its require-
ments than would be possible if the promotion were to an unsigned
type.
The Standard should be modified to say that if the result of any of the
above forms of expression is converted to an unsigned type whose values
are all representable as "unsigned int", the result shall be the same as
if the operands were converted to that type and the operation was performed
on that type.
NONE, outside of examples like the above contrived to show the effect
of Undefined Behavior.
NONE, except in cases where compilers would make inferences about possible
argument values; such inferences would need to be disabled in the
aforementioned cases.
NONE, except in cases where the inputs received by the program would never
cause integer overflow, such knowledge would allow a compiler to generate
more efficient code, and the only basis for such inference would come from
expressions which involve short unsigned types being promoted to signed
integers. Such cases are in practice likely vastly outnumbered by those
in which such inferences would disrupt the behavior of programs whose
behavior would be fully (and correctly) defined if this rule were in
effect.
While some forms of Undefined Behavior are intended to grant useful freedom
to compilers, the forms covered by this proposal were almost certainly not
intended to do so; the most likely reason that the Standard does not mandate
that overflow behavior in these cases yield the same result as would have
occurred using unsigned math is that the authors of the Standard likely
expected that compilers would always behave that way whether or not the
Standard compelled them to do so.
Other than the examples posted by Mr. Lantz, (who was apparently
involved at the time the rule was set), there is the issue of computers
that *don't* use the currently ubiquitous 2s-complement encoding of
signed int types.

In particular, I believe the results are *different* for most of your
suggested cases if the implementation uses either 1s-complement or
sign+magnitude encoding of signed int types. A hypothetical modern
example would be a compiler for a machine that did all int operations
as special cases in the floating point hardware, rather than the
historic case of machines emulating floating point operations as
sequences of integer operations. A real world historic example would
be IBM system/360 and its successors (possibly including system/z).

Even if such a machine doesn't go the crazy route of running into
insecure lala land on overflow, its natural non-crashing overflow
handling is likely to act differently in some or all of your cases,
especially (but not only) for values near INT_MIN (etc.).


Enjoy

Jakob
--
Jakob Bohm, CIO, Partner, WiseMo A/S. https://www.wisemo.com
Transformervej 29, 2860 Søborg, Denmark. Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded
s***@casperkitty.com
2016-02-14 19:27:46 UTC
Permalink
Raw Message
Post by Jakob Bohm
Other than the examples posted by Mr. Lantz, (who was apparently
involved at the time the rule was set), there is the issue of computers
that *don't* use the currently ubiquitous 2s-complement encoding of
signed int types.
I don't have any such machines to play with, but even on such machines I
would expect the rule would not pose much difficulty.
Post by Jakob Bohm
In particular, I believe the results are *different* for most of your
suggested cases if the implementation uses either 1s-complement or
sign+magnitude encoding of signed int types. A hypothetical modern
example would be a compiler for a machine that did all int operations
as special cases in the floating point hardware, rather than the
historic case of machines emulating floating point operations as
sequences of integer operations. A real world historic example would
be IBM system/360 and its successors (possibly including system/z).
I would expect (and I'd be interested in knowing of historical counter-
examples) that even (especially!) on machines where signed integer math
is done in some fashion other than two's-complement, that a statement
like "ushort1 = ushort2*(ushort3-ushort4);" could be more efficiently
evaluated using unsigned math than by using signed math, and that
compilers would routinely use unsigned math in the latter scenario since
it would yield the same result in all defined cases as using signed math,
and wold avoid the need to mod-convert negative values.

While C was designed to make compilation as simple as possible, and while
outside-in expression evaluation may have been too expensive in the days
when people would want compilers to fit in 32K or less, I don't think the
cost would be meaningful when compared with the cost of other new features
that have been added since the original K&R C. Further, in the rare cases
where my rule as proposed, programmers wanting to allow UB in cases where
the Standard presently does could write, e.g.

ushort1 = (int)(ushort2*(ushort3-ushort4));

since the final conversion would be applied to an expression of the form
"(int)something", and that is not one of the forms I've listed, the rule
would not be applied and compilers would be entitled to do anything they
like in cases where int overflow would cause Undefined Behavior.

I'm pretty sure I appreciate more than most people the usefulness of having
a language which is amenable to implementation on exotic architectures,
since I've programmed on machines where "char" isn't 8 bits, where relational
comparisons among unrelated pointers don't yield a non-overlapping ranking,
etc. but at the same time I see no reason why people whose code will never
have to run on such exotic machines should be expected to abide by the
limitations of machines upon which their code will never need to run.

Further, I think an essential feature of good languages is that they should
avoid making programmers jump through hoops to avoid Undefined Behavior in
cases which 90+% of implementations would handle correctly 100% of the time
and 99+% of implementations would handle correctly 99+% of the time. Would
it make more sense to require programmers who want a mod-4294967296 product
to write:

uint32_t mul_mod_4294967296(uint32_t x, uint32_t y)
{ return 1U*x*y; }

or to require that compilers interpret

uint32_t mul_mod_4294967296(uint32_t x, uint32_t y)
{ return x*y; }

so as to yield a mod-4294967296 product regardless of the size of "int"?
Post by Jakob Bohm
Even if such a machine doesn't go the crazy route of running into
insecure lala land on overflow, its natural non-crashing overflow
handling is likely to act differently in some or all of your cases,
especially (but not only) for values near INT_MIN (etc.).
I'm genuinely curious how many compilers there have been which, given
[adjust syntax as needed for pre-C89 compilers which don't understand
"volatile" or C89-style function declarations]:

unsigned short volatile temp;
unsigned short hide_value_from_compiler(unsigned short x)
{ temp = x; return temp; }

unsigned short mul_mod_USHRT_MAX_plus_one(unsigned short x,
unsigned short y)
{
x = hide_value_from_compiler(x);
return x*y;
}

would generate code for the multiply whose behavior would ever differ from
that of "return 1u*x*y;"? I would expect there are some compilers where
the latter expression would generate significantly slower code (performing
an extra, useless, multiply by one) but are there any where the behavior
of the code without the extra multiply wouldn't match that of the code
with the extra multiply?
Jakob Bohm
2016-02-14 22:30:18 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
Post by Jakob Bohm
Other than the examples posted by Mr. Lantz, (who was apparently
involved at the time the rule was set), there is the issue of computers
that *don't* use the currently ubiquitous 2s-complement encoding of
signed int types.
I don't have any such machines to play with, but even on such machines I
would expect the rule would not pose much difficulty.
Post by Jakob Bohm
In particular, I believe the results are *different* for most of your
suggested cases if the implementation uses either 1s-complement or
sign+magnitude encoding of signed int types. A hypothetical modern
example would be a compiler for a machine that did all int operations
as special cases in the floating point hardware, rather than the
historic case of machines emulating floating point operations as
sequences of integer operations. A real world historic example would
be IBM system/360 and its successors (possibly including system/z).
I would expect (and I'd be interested in knowing of historical counter-
examples) that even (especially!) on machines where signed integer math
is done in some fashion other than two's-complement, that a statement
like "ushort1 = ushort2*(ushort3-ushort4);" could be more efficiently
evaluated using unsigned math than by using signed math, and that
compilers would routinely use unsigned math in the latter scenario since
it would yield the same result in all defined cases as using signed math,
and wold avoid the need to mod-convert negative values.
I am not talking efficiency. I am talking about the possibility of
getting a different (implementation related, possibly implementation
defined) numeric result through application of the current language
definition compared to your proposed change.
Post by s***@casperkitty.com
While C was designed to make compilation as simple as possible, and while
outside-in expression evaluation may have been too expensive in the days
when people would want compilers to fit in 32K or less, I don't think the
cost would be meaningful when compared with the cost of other new features
that have been added since the original K&R C. Further, in the rare cases
where my rule as proposed, programmers wanting to allow UB in cases where
the Standard presently does could write, e.g.
ushort1 = (int)(ushort2*(ushort3-ushort4));
since the final conversion would be applied to an expression of the form
"(int)something", and that is not one of the forms I've listed, the rule
would not be applied and compilers would be entitled to do anything they
like in cases where int overflow would cause Undefined Behavior.
It is not about programmers requesting specific behavior explicitly for
some new previously unimplemented case. But about ensuring that
existing programs continue to behave as they always have, either due to
already defined standard behavior or due to reasonable and predictable
implementation choices consistent with existing standards.
Post by s***@casperkitty.com
I'm pretty sure I appreciate more than most people the usefulness of having
a language which is amenable to implementation on exotic architectures,
since I've programmed on machines where "char" isn't 8 bits, where relational
comparisons among unrelated pointers don't yield a non-overlapping ranking,
etc. but at the same time I see no reason why people whose code will never
have to run on such exotic machines should be expected to abide by the
limitations of machines upon which their code will never need to run.
Further, I think an essential feature of good languages is that they should
avoid making programmers jump through hoops to avoid Undefined Behavior in
cases which 90+% of implementations would handle correctly 100% of the time
and 99+% of implementations would handle correctly 99+% of the time. Would
it make more sense to require programmers who want a mod-4294967296 product
uint32_t mul_mod_4294967296(uint32_t x, uint32_t y)
{ return 1U*x*y; }
or to require that compilers interpret
uint32_t mul_mod_4294967296(uint32_t x, uint32_t y)
{ return x*y; }
so as to yield a mod-4294967296 product regardless of the size of "int"?
Post by Jakob Bohm
Even if such a machine doesn't go the crazy route of running into
insecure lala land on overflow, its natural non-crashing overflow
handling is likely to act differently in some or all of your cases,
especially (but not only) for values near INT_MIN (etc.).
I'm genuinely curious how many compilers there have been which, given
[adjust syntax as needed for pre-C89 compilers which don't understand
unsigned short volatile temp;
unsigned short hide_value_from_compiler(unsigned short x)
{ temp = x; return temp; }
unsigned short mul_mod_USHRT_MAX_plus_one(unsigned short x,
unsigned short y)
{
x = hide_value_from_compiler(x);
return x*y;
}
would generate code for the multiply whose behavior would ever differ from
that of "return 1u*x*y;"? I would expect there are some compilers where
the latter expression would generate significantly slower code (performing
an extra, useless, multiply by one) but are there any where the behavior
of the code without the extra multiply wouldn't match that of the code
with the extra multiply?
Ok, here is one hypothetical case, I don't have a relevant machine
handy either.

1. 32 bit int, 16 bit short.

2. Now USHORT_MAX * USHORT_MAX will obviously be > INT_MAX.

3. signed int is a different encoding that 2s complement.

4. Compiler is sane enough not to explode or raise exception from
multiply overflow.

5. Hardware handles overflow by letting most bits fall where the
circuits to handle non-overflow would set them (i.e. they don't
slow the processor down to special case overflowing numbers).

6. Now examine the int value coming out of (int)USHORT_MAX *
(int)USHORT_MAX. Now this could be +0x7FFE0001 which has the right
low 16 bits for your rule. But it could also be something like
(int)0xFFFE0000 due to triggering a 1-s complement correction circuit
intended for a different (non-overflow) case.

7. Whatever the result, it is probably fully defined by the hardware
instruction set manual for the machine architecture, and the C
compiler simply emits the most efficient multiply instruction that
will give the correct int result, then applies optimizations that
don't change the result between optimized and unoptimized builds.


Enjoy

Jakob
--
Jakob Bohm, CIO, Partner, WiseMo A/S. https://www.wisemo.com
Transformervej 29, 2860 Søborg, Denmark. Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded
s***@casperkitty.com
2016-02-14 22:47:40 UTC
Permalink
Raw Message
Post by Jakob Bohm
I am not talking efficiency. I am talking about the possibility of
getting a different (implementation related, possibly implementation
defined) numeric result through application of the current language
definition compared to your proposed change.
Can you identify any implementations which have documented a contrary
useful behavior, or any instances of production code which have ever
relied upon such behaviors?

There exists lots of real production code which relies upon the behavior I
describe, since in the 1990s people would have regarded as absurd the idea
that any C compiler for a non-archaic platform which wasn't configured for
hyper-pedantic mode would ever behave in any fashion not consistent with
the rules of modular arithmetic. Given that MISRA-C accepts such constructs
without complaint, I would say there is a substantial likelihood that there
exists safety-critical code which relies upon such behavior and was correct
on the system for which it was written, will get ported to a compiler which
will use the Undefined Behavior as an excuse to bypass critical pieces of
program logic unless the Standard is changed to forbid compilers from
doing so.
Post by Jakob Bohm
It is not about programmers requesting specific behavior explicitly for
some new previously unimplemented case. But about ensuring that
existing programs continue to behave as they always have, either due to
already defined standard behavior or due to reasonable and predictable
implementation choices consistent with existing standards.
I would be surprised if *any* production code exists *anywhere* which relies
upon any behavior contrary to that which I would seek to prescribe.

If you're concerned about code for one's-complement or signed-magnitude
machines that migh have trouble with that, I would be open to exempting
such machines from the requirements, given that much of the code that
would benefit from the rule would for other reasons have no realistic hope
of working on such machine but could refuse to compile on them.

My concern is with hyper-modern compilers which use UB as an excuse to bypass
or eliminate program logic.
Post by Jakob Bohm
Ok, here is one hypothetical case, I don't have a relevant machine
handy either.
1. 32 bit int, 16 bit short.
2. Now USHORT_MAX * USHORT_MAX will obviously be > INT_MAX.
3. signed int is a different encoding that 2s complement.
4. Compiler is sane enough not to explode or raise exception from
multiply overflow.
5. Hardware handles overflow by letting most bits fall where the
circuits to handle non-overflow would set them (i.e. they don't
slow the processor down to special case overflowing numbers).
6. Now examine the int value coming out of (int)USHORT_MAX *
(int)USHORT_MAX. Now this could be +0x7FFE0001 which has the right
low 16 bits for your rule. But it could also be something like
(int)0xFFFE0000 due to triggering a 1-s complement correction circuit
intended for a different (non-overflow) case.
7. Whatever the result, it is probably fully defined by the hardware
instruction set manual for the machine architecture, and the C
compiler simply emits the most efficient multiply instruction that
will give the correct int result, then applies optimizations that
don't change the result between optimized and unoptimized builds.
Such a thing is theoretically possible. Would any existing compilers
behave that way? As noted earlier, if the proposal were weakened to only
cover platforms where ~0u == -1u, would you see any remaining problems?
Loading...