Discussion:
Bit-clear operator
(too old to reply)
w***@gmail.com
2013-07-13 15:27:36 UTC
Permalink
Raw Message
Clearing the bottom two bits of an integer is easy, right?

x &= ~3;

Well, no. Not if x has type 'long' and sizeof(long) > sizeof(int). In that case, you just inadvertently cleared the top bits of x as well as the bottom two. Surely not what the programmer intended.

A new operator could solve this. Perhaps ~&. x = x ~& 3. Or for an assignment operator, x ~&= 3. The ~= operator is also unused (ie x ~= 3) but that might be a little unintuitive.

(Apologies for not speaking in standardese)
Hans-Bernhard Bröker
2013-07-13 15:57:05 UTC
Permalink
Raw Message
Post by w***@gmail.com
Clearing the bottom two bits of an integer is easy, right?
Yest it is, if done properly.
Post by w***@gmail.com
x &= ~3;
Well, no. Not if x has type 'long' and sizeof(long) > sizeof(int).
And like any lint worth having would tell you, that's because the above
is not the right way of doing the job, particularly in that situation.
Post by w***@gmail.com
In that case, you just inadvertently cleared the top bits of x as
well as the bottom two. Surely not what the programmer intended.
Programming languages are hardly ever about what the programmer
intended. They're about what the programmer actually writes. If that
writing is at odds with the intent, that's the programmer's fault, not
the language's.
Post by w***@gmail.com
A new operator could solve this.
As could the old operators and language elements, if used properly:

x &= ~3L;
James Kuyper
2013-07-13 21:20:42 UTC
Permalink
Raw Message
Post by w***@gmail.com
Clearing the bottom two bits of an integer is easy, right?
x &= ~3;
Well, no. Not if x has type 'long' and sizeof(long) > sizeof(int). In that case, you just inadvertently cleared the top bits of x as well as the bottom two. Surely not what the programmer intended.
The relevant criterion isn't sizeof(long)>sizeof(int), but LONG_MAX >
INT_MAX.
It is almost always an error to use bitwise operators such as ~ on
signed values such as x and 3.
Assuming x were changed to unsigned long, no new operator would be
needed, x &= ~3UL would do the job quite nicely without any change to
the language.
--
James Kuyper
Derek M. Jones
2013-07-13 23:24:13 UTC
Permalink
Raw Message
Willy,
Post by w***@gmail.com
Clearing the bottom two bits of an integer is easy, right?
x &= ~3;
Well, no. Not if x has type 'long' and sizeof(long) > sizeof(int). In that case, you just inadvertently cleared the top bits of x as well as the bottom two. Surely not what the programmer intended.
sizeof or any other test has nothing to do with this.

You might want to check out the usual arithmetic conversions.
http://c0x.coding-guidelines.com/6.3.1.8.html
James Kuyper
2013-07-14 00:18:28 UTC
Permalink
Raw Message
Post by Derek M. Jones
Willy,
Post by w***@gmail.com
Clearing the bottom two bits of an integer is easy, right?
x &= ~3;
Well, no. Not if x has type 'long' and sizeof(long) > sizeof(int). In that case, you just inadvertently cleared the top bits of x as well as the bottom two. Surely not what the programmer intended.
sizeof or any other test has nothing to do with this
I agree about sizeof, but not about "any other test"
Post by Derek M. Jones
You might want to check out the usual arithmetic conversions.
http://c0x.coding-guidelines.com/6.3.1.8.html
Those conversions result in the value of ~3 being converted to long.
Since that value necessarily can be represented as a long, it is
unchanged by the conversion However, what he actually wanted was the
value of ~3L, and that will be the same as the value of ~3 only if
LONG_MAX == INT_MAX && LONG_MIN == INT_MIN, so there is an "other test"
that is relevant.
--
James Kuyper
Derek M. Jones
2013-07-14 13:11:54 UTC
Permalink
Raw Message
James,
Post by James Kuyper
Those conversions result in the value of ~3 being converted to long.
Dodgy eyesight on my part. I read it as -3 rather than ~3.
I think I can also blame the non-code oriented font...
Post by James Kuyper
Since that value necessarily can be represented as a long, it is
unchanged by the conversion However, what he actually wanted was the
value of ~3L, and that will be the same as the value of ~3 only if
LONG_MAX == INT_MAX && LONG_MIN == INT_MIN, so there is an "other test"
that is relevant.
Tim Rentsch
2013-07-14 19:45:29 UTC
Permalink
Raw Message
Post by James Kuyper
Post by Derek M. Jones
Willy,
Post by w***@gmail.com
Clearing the bottom two bits of an integer is easy, right?
x &= ~3;
Well, no. Not if x has type 'long' and sizeof(long) > sizeof(int).
In that case, you just inadvertently cleared the top bits of x as
well as the bottom two. Surely not what the programmer intended.
sizeof or any other test has nothing to do with this
I agree about sizeof, but not about "any other test"
Post by Derek M. Jones
You might want to check out the usual arithmetic conversions.
http://c0x.coding-guidelines.com/6.3.1.8.html
Those conversions result in the value of ~3 being converted to
long. Since that value necessarily can be represented as a long,
it is unchanged by the conversion. However, what he actually
wanted was the value of ~3L, and that will be the same as the
value of ~3 only if LONG_MAX == INT_MAX && LONG_MIN == INT_MIN,
[snip]
A bold statement, but not a correct one.

Regardless of the values of LONG_MAX, etc, ~3 == ~3L holds in
all implementations that use twos complement or ones complement
(meaning, uses one or the other exclusively).

To be sure, in signed-magnitude implementations, ~3 == ~3L will
be true only if LONG_MAX == INT_MAX (in which case the relation
on the MIN values also holds).

In implementations (assuming that any exist, which they probably
don't) that use a mixture of representation schemes -- eg, twos
complement and signed-magnitude -- then it won't necessarily be
true that ~3 == ~3L. But then ~3 == ~3L might not hold even if
LONG_MAX == INT_MAX and LONG_MIN == INT_MIN.

Are there any signed-magnitude (or mixed) implementations still
in current use? If not, then for all practical purposes we
may assume ~3 == ~3L.
w***@wilbur.25thandClement.com
2013-07-15 00:17:46 UTC
Permalink
Raw Message
Tim Rentsch <***@alumni.caltech.edu> wrote:
<snip>
Post by Tim Rentsch
Are there any signed-magnitude (or mixed) implementations still
in current use? If not, then for all practical purposes we
may assume ~3 == ~3L.
http://public.support.unisys.com/aseries/docs/clearpath-mcp-14.0/pdf/86002268-205.pdf

However, this is not a good counterexample because short, int, and long all
share the same width (48 bits) and range (+/- 2^39-1)

I've never seen or used one, I just got tired of hearing anecdotes about
them and decided to find a reference manual for at least one of the models.
Tim Rentsch
2013-07-15 04:06:34 UTC
Permalink
Raw Message
Post by w***@wilbur.25thandClement.com
<snip>
Post by Tim Rentsch
Are there any signed-magnitude (or mixed) implementations still
in current use? If not, then for all practical purposes we
may assume ~3 == ~3L.
http://public.support.unisys.com/aseries/docs/clearpath-mcp-14.0/pdf/86002268-205.pdf
An interesting document. Thank you for the link.
Post by w***@wilbur.25thandClement.com
However, this is not a good counterexample because short, int, and
long all share the same width (48 bits) and range (+/- 2^39-1)
I've never seen or used one, I just got tired of hearing anecdotes
about them and decided to find a reference manual for at least one
of the models.
This implementation has a variety of unusual properties, notably:

* A mixture of representation schemes (two's complement for
chars, s/m for non-chars)

* Padding bits in integer types

* INT_MAX == UINT_MAX

* Basic data types have a size that is not a power of two
(more specifically, six 8-bit bytes)

It isn't quite a current implementation, in the sense that it
supports C90 but not C99 (no long long type, for example). And
trying to bring it up to C11 would entail a lot of work, because
of the limitation in C11 that alignments must be powers of two.
Given the word size of 48 bits, that would most likely mean
changing CHAR_BIT to 12 (or perhaps 24 or 48); I suspect
getting that to work in a system that uses 8-bit characters
internally would be seen as more trouble than its worth.

Notice that the compiler is not conforming in its default mode,
due to how unsigned integer types are treated. In the section on
Compiler Options the manual says this (10-11, p 241):

In order for the execution of a C program to conform to the
ANSI C Standard, the PORT(CHAR2) and PORT(UNSIGNED) compiler
control options must be set to TRUE.

These options are described on page 285

CHAR2

(Type: Boolean; Default: FALSE)

The CHAR2 option, when enabled, causes unsigned chars to be
stored as two's complement values.

and page 288

UNSIGNED

(Type: Boolean; Default: FALSE)

The UNSIGNED option controls the semantics of the sign
attribute for integer types. If this option is disabled,
unsigned integers are treated as signed integers; that is,
normal signed-magnitude arithmetic is performed. If the
option is enabled, unsigned integers are emulated as
two's-complement quantities. [performance comment snipped]
Richard Damon
2013-07-14 13:01:04 UTC
Permalink
Raw Message
Post by w***@gmail.com
Clearing the bottom two bits of an integer is easy, right?
x &= ~3;
Well, no. Not if x has type 'long' and sizeof(long) > sizeof(int).
In that case, you just inadvertently cleared the top bits of x as
well as the bottom two. Surely not what the programmer intended.
A new operator could solve this. Perhaps ~&. x = x ~& 3. Or for an
assignment operator, x ~&= 3. The ~= operator is also unused (ie x
~= 3) but that might be a little unintuitive.
(Apologies for not speaking in standardese)
Actually, since 3 is a signed integer, if x is long, then ~3 (which will
be negative, typically -4 on a two's complement implementation) will be
sign extend, and result in the same as ~3L provided that int and long
use the same encoding (since today, finding something other than two's
complement would be highly unusual, and if you did, you are almost sure
to find it being consistent, this is almost a promise).

The problem would be if you did
x @= ~3U;

You do run into the fact that the results of the bit operators on signed
and negative values hits implementation defined behavior (It might be
undefined), but again, on almost any practical situation it will work as
expected.
Tim Rentsch
2013-07-14 19:49:59 UTC
Permalink
Raw Message
Post by Richard Damon
Post by w***@gmail.com
Clearing the bottom two bits of an integer is easy, right?
x &= ~3;
Well, no. Not if x has type 'long' and sizeof(long) > sizeof(int).
In that case, you just inadvertently cleared the top bits of x as
well as the bottom two. Surely not what the programmer intended.
A new operator could solve this. Perhaps ~&. x = x ~& 3. Or for an
assignment operator, x ~&= 3. The ~= operator is also unused (ie x
~= 3) but that might be a little unintuitive.
(Apologies for not speaking in standardese)
Actually, since 3 is a signed integer, if x is long, then ~3
(which will be negative, typically -4 on a two's complement
implementation) will be sign extend, and result in the same as
~3L provided that int and long use the same encoding (since
today, finding something other than two's complement would be
highly unusual, and if you did, you are almost sure to find it
being consistent, this is almost a promise). [snip unrelated]
What you say is true only for twos complement or ones complement.
For signed-magnitude, ~3 == ~3L iff the ranges of int and long
are the same.
Michael Deckers
2013-07-14 17:59:53 UTC
Permalink
Raw Message
Post by w***@gmail.com
Clearing the bottom two bits of an integer is easy, right?
x &= ~3;
Well, no. Not if x has type 'long' and sizeof(long) > sizeof(int).
In that case, you just inadvertently cleared the top bits of x as well
as the bottom two. Surely not what the programmer intended.
A new operator could solve this. Perhaps ~&. x = x ~& 3. Or for
an assignment operator, x ~&= 3. The ~= operator is also unused
(ie x ~= 3) but that might be a little unintuitive.
Why not just write what you mean:

x ~= (x & 0x11) or
x ~= (x & 0x11u)

Michael Deckers.
Michael Deckers
2013-07-14 18:14:42 UTC
Permalink
Raw Message
Post by Michael Deckers
x ~= (x & 0x11) or
x ~= (x & 0x11u)
instead of

x ^= (x & 0x11) or
x ^= (x & 0x11u)

Michael Deckers.
Tim Rentsch
2013-07-14 19:17:58 UTC
Permalink
Raw Message
Post by w***@gmail.com
Clearing the bottom two bits of an integer is easy, right?
x &= ~3;
Well, no. Not if x has type 'long' and sizeof(long) > sizeof(int).
In that case, you just inadvertently cleared the top bits of x as well
as the bottom two. Surely not what the programmer intended.
A new operator could solve this. Perhaps ~&. x = x ~& 3. Or for an
assignment operator, x ~&= 3. The ~= operator is also unused (ie x ~=
3) but that might be a little unintuitive.
Your analysis has some unjustified leaps. But in any case it's
easy to clear the bottom two bits:

x &= ~(0 ? x : 3);

This works for all integer types at least as big as int, and
for non-negative values of integer types smaller than int.

For negative values of integer types smaller than int, whether
it works depends on implementation-defined behavior
(specifically converting out-of-range values). It will work
fine on essentially any platform you are likely to encounter.
Besides, anyone who is clearing bits of a negative value (and
only for types smaller than int!) is already doing something
dangerous, and ought to be prepared for some platform-dependent
behavior.
1***@gmail.com
2015-10-05 17:54:03 UTC
Permalink
Raw Message
Post by Tim Rentsch
Post by w***@gmail.com
Clearing the bottom two bits of an integer is easy, right?
x &= ~3;
Well, no. Not if x has type 'long' and sizeof(long) > sizeof(int).
In that case, you just inadvertently cleared the top bits of x as well
as the bottom two. Surely not what the programmer intended.
A new operator could solve this. Perhaps ~&. x = x ~& 3. Or for an
assignment operator, x ~&= 3. The ~= operator is also unused (ie x ~=
3) but that might be a little unintuitive.
Your analysis has some unjustified leaps. But in any case it's
x &= ~(0 ? x : 3);
I fail to understand how it is clearing the last two bits when 0 is evaluated as true i.e. if 0 is promoted to long when x is of type long.
Wouldn't that be evaluated as:
x = x &~(x)
Post by Tim Rentsch
This works for all integer types at least as big as int, and
for non-negative values of integer types smaller than int.
For negative values of integer types smaller than int, whether
it works depends on implementation-defined behavior
(specifically converting out-of-range values). It will work
fine on essentially any platform you are likely to encounter.
Besides, anyone who is clearing bits of a negative value (and
only for types smaller than int!) is already doing something
dangerous, and ought to be prepared for some platform-dependent
behavior.
Jakob Bohm
2015-10-05 19:24:09 UTC
Permalink
Raw Message
Note: I am not sure where the beginning of this thread is located,
this is the first I see of it on comp.std.c .
Post by 1***@gmail.com
Post by Tim Rentsch
Post by w***@gmail.com
Clearing the bottom two bits of an integer is easy, right?
x &= ~3;
Well, no. Not if x has type 'long' and sizeof(long) > sizeof(int).
In that case, you just inadvertently cleared the top bits of x as well
as the bottom two. Surely not what the programmer intended.
A new operator could solve this. Perhaps ~&. x = x ~& 3. Or for an
assignment operator, x ~&= 3. The ~= operator is also unused (ie x ~=
3) but that might be a little unintuitive.
Your analysis has some unjustified leaps. But in any case it's
x &= ~(0 ? x : 3);
I fail to understand how it is clearing the last two bits when 0 is evaluated as true i.e. if 0 is promoted to long when x is of type long.
x = x &~(x)
0 is a constant, it is always false. The trick of this code is to
portably force the constant "3" of type int to the (arbitrary) integral
type of the variable x, by taking advantage of language rules saying
that compilers must promote the type of a ?: expression to a common
integral supertype regardless if the compiler will optimize the
condition away.

If the type of x was known in advance, the code could (and usually is)
written like this:

x &= ~(typeofx)3;

Or if typeofx is known to be unsigned long or unsigned long long

x &= ~3ul;
x &= ~3ull;

The following would also work with more recent C compilers (those
correctly implementing stdint.h as required by the current standard):

x &= ~(uintmax_t)3; // Won't work if stdint.h is wrong/missing
// Assumes compiler will optimize away any excess
// high bits in the 0xFFFF...FFFC value.
// Assumes compiler will not stubbornly warn
// about uintmax_t being larger than x.

The trick notation with the ?: operator has the advantage that it works
even for whatever compiler-specific and/or content dependent type x may
have, for instance a compiler-specific type behind one of the types
from <stdint.h> .

Conversely, introducing some new operator for something that can be
expressed with existing operators in a way that any sane compiler will
optimize to whatever the new operator would represent is a waste
inconsistent with the basic ideas behind the C language.
Post by 1***@gmail.com
Post by Tim Rentsch
This works for all integer types at least as big as int, and
for non-negative values of integer types smaller than int.
For negative values of integer types smaller than int, whether
it works depends on implementation-defined behavior
(specifically converting out-of-range values). It will work
fine on essentially any platform you are likely to encounter.
Besides, anyone who is clearing bits of a negative value (and
only for types smaller than int!) is already doing something
dangerous, and ought to be prepared for some platform-dependent
behavior.
P.S.

For pedantics: I use the word "compiler" as a traditional reference to
whichever part(s) of a language implementation that deals with the
source code of a single "compilation unit", issues related diagnostics
and does other such per-unit processing. In some language
implementations, the "compiler" does not refer to any actual program or
component.


Enjoy

Jakob
--
Jakob Bohm, CIO, Partner, WiseMo A/S. https://www.wisemo.com
Transformervej 29, 2860 Søborg, Denmark. Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded
Kaz Kylheku
2015-10-05 19:59:59 UTC
Permalink
Raw Message
Post by Jakob Bohm
Post by 1***@gmail.com
Post by Tim Rentsch
Post by w***@gmail.com
Clearing the bottom two bits of an integer is easy, right?
Your analysis has some unjustified leaps. But in any case it's
x &= ~(0 ? x : 3);
I fail to understand how it is clearing the last two bits when 0 is evaluated as true i.e. if 0 is promoted to long when x is of type long.
x = x &~(x)
0 is a constant, it is always false. The trick of this code is to
portably force the constant "3" of type int to the (arbitrary) integral
type of the variable x, by taking advantage of language rules saying
that compilers must promote the type of a ?: expression to a common
integral supertype regardless if the compiler will optimize the
condition away.
I would hide this behind a macro:

#define harmonize_type(dest, source) (0 ? (source) : (dest))
Paul D. DeRocco
2013-07-18 04:27:17 UTC
Permalink
Raw Message
Post by w***@gmail.com
Clearing the bottom two bits of an integer is easy, right?
x &= ~3;
Well, no. Not if x has type 'long' and sizeof(long) > sizeof(int). In that case, you just inadvertently cleared the top bits of x as well as the bottom two. Surely not what the programmer intended.
A new operator could solve this. Perhaps ~&. x = x ~& 3. Or for an assignment operator, x ~&= 3. The ~= operator is also unused (ie x ~= 3) but that might be a little unintuitive.
(Apologies for not speaking in standardese)
I can't see the point of doing something drastic like adding a whole new
operator, when it is so easy to do what you want with the language as it is.

I rather like some of the Gnu extensions, like x ? : y, which doesn't
involve changing the lexer but provides something that's sometimes
difficult to do otherwise.
--
Ciao, Paul D. DeRocco
Paul mailto:***@ix.netcom.com
s***@casperkitty.com
2015-10-08 17:49:45 UTC
Permalink
Raw Message
Post by w***@gmail.com
A new operator could solve this. Perhaps ~&. x = x ~& 3. Or for an assignment operator, x ~&= 3. The ~= operator is also unused (ie x ~= 3) but that might be a little unintuitive.
On a two's-complement machine, x &= ~const; will work for any type of x if
the constant is *signed*. The problematic case doesn't occur often, but
can be surprising: [assuming 16-bit int for brevity, and assuming x is a
32-bit long]

x &= ~0x00004000; // works -- value -16835 gets promoted 0xFFFFBFFFL
x &= ~0x00008000; // fails -- value 32768u gets promoted 0x00008000L
x &= ~0x00010000; // works -- value -65537L is simply 0xFFFEFFFFL

Still, even though the problematic case doesn't bite overly often, I would
favor making "~" usable as a one- or two-operand operator, much like "-".
So x~y would be a shorthand for x&~(suitableCast)y, and likewise x~=y
would be a shorthand for x&=~(suitableCast)y. The only difficulty I see
would be overcoming a major chicken-and-egg problem of building demand when
there's no compiler support, and gaining compiler support when there's no
demand.
Kaz Kylheku
2015-10-08 18:46:04 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
On a two's-complement machine, x &= ~const; will work for any type of x if
the constant is *signed*. The problematic case doesn't occur often, but
can be surprising: [assuming 16-bit int for brevity, and assuming x is a
32-bit long]
x &= ~0x00004000; // works -- value -16835 gets promoted 0xFFFFBFFFL
x &= ~0x00008000; // fails -- value 32768u gets promoted 0x00008000L
x &= ~0x00010000; // works -- value -65537L is simply 0xFFFEFFFFL
Just a thought: maybe if an integral constant is expressed in hex, the leading
zeros should count toward choosing the minimum width of its type.
That is to say, the type of 0x000..NNN should be the same type
that would be selected for 0xF00...NNN.
s***@casperkitty.com
2015-10-08 19:50:54 UTC
Permalink
Raw Message
Post by Kaz Kylheku
Just a thought: maybe if an integral constant is expressed in hex, the leading
zeros should count toward choosing the minimum width of its type.
That is to say, the type of 0x000..NNN should be the same type
that would be selected for 0xF00...NNN.
The integer type system of C was never designed to facilitate the porting of
code among machines [actually, I don't know if it was ever really "designed"
at all--more likely it just sorta "happened"]. If one were designing a new
language and associated type system, there are many things which could be
done differently. The treatment of leading digits might be a part of that,
but other issues are probably much more important (like the need for a type
which will behave on all systems the way int32_t behaves on systems where an
int is 32 bits, and a type which will behave on all systems like an int32_t
behaves on systems where an int is 64 bits.

Loading...