Discussion:
Proposal: Make (0x1e+1) yield (0x1f)
(too old to reply)
s***@casperkitty.com
2016-03-30 16:06:35 UTC
Permalink
Raw Message
One strange quirk of the C Standard is that the parsing rules which allow
plus and minus signs within numeric tokens don't distinguish between those
that start with 0x and those which don't. As a consequence, something like
0x1e+1 is not treated by the Standard as a series of three tokens, but rather
as one invalid token.

I would propose that the rule for token binding be changed so that a token
which starts with "0x" will be terminated by a plus or a minus sign, even
if the last character of the token was an "e" or "E" or, alternatively, that
when a token of the form 0x[0-9a-fA-F]*[eE][+-][0-9]* is encountered it be
decomposed into two or three tokens. These behaviors would be almost
equivalent, but for some corner cases involving macros which would be unlikely
to arise outside of contrived situations but should be acknowledged if not
specified [to minimize the burden on compilers, I think it would probably be
best to explicitly say that a compiler would be allowed to process something
like

#define paste(x,y) x##y
paste(0x,1e+1)

as though it generated distinct tokens 0x1e + 1, but a compiler would also
be allowed to treat the output of paste as though it were a single token
0x1e+1 which would be invalid for most purposes.]

I can't think of any realistic scenarios where production code might
plausibly rely upon the mandated token-binding behavior, but I'm not as
creative with the preprocessor as some people, so there may be useful
cases I'm not aware of.
Jakob Bohm
2016-03-30 16:14:42 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
One strange quirk of the C Standard is that the parsing rules which allow
plus and minus signs within numeric tokens don't distinguish between those
that start with 0x and those which don't. As a consequence, something like
0x1e+1 is not treated by the Standard as a series of three tokens, but rather
as one invalid token.
I would propose that the rule for token binding be changed so that a token
which starts with "0x" will be terminated by a plus or a minus sign, even
if the last character of the token was an "e" or "E" or, alternatively, that
when a token of the form 0x[0-9a-fA-F]*[eE][+-][0-9]* is encountered it be
decomposed into two or three tokens. These behaviors would be almost
equivalent, but for some corner cases involving macros which would be unlikely
to arise outside of contrived situations but should be acknowledged if not
specified [to minimize the burden on compilers, I think it would probably be
best to explicitly say that a compiler would be allowed to process something
like
#define paste(x,y) x##y
paste(0x,1e+1)
as though it generated distinct tokens 0x1e + 1, but a compiler would also
be allowed to treat the output of paste as though it were a single token
0x1e+1 which would be invalid for most purposes.]
I can't think of any realistic scenarios where production code might
plausibly rely upon the mandated token-binding behavior, but I'm not as
creative with the preprocessor as some people, so there may be useful
cases I'm not aware of.
You miss the point.

Some implementations allow hexadecimal floating point constants.

So 0x1e+1 is a hexedecimal floating point number with a mantissa of 1
and an exponent of 1. I don't remember of hand if the exponent for
hexadecimal floating point is a power of 2, 10 or 16, but those would
make this equal to 2.0, 10.0 or 16.0 .

Hexadecimal/binary floating point numbers make a lot of sense on
machines that use the IEEE floating point formats (such as x86),
because those formats are inherently binary with decimal notation
always being an approximation of the value actually used by the
computer.



Enjoy

Jakob
--
Jakob Bohm, CIO, Partner, WiseMo A/S. https://www.wisemo.com
Transformervej 29, 2860 Søborg, Denmark. Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded
Keith Thompson
2016-03-30 16:36:18 UTC
Permalink
Raw Message
Post by Jakob Bohm
Post by s***@casperkitty.com
One strange quirk of the C Standard is that the parsing rules which
allow plus and minus signs within numeric tokens don't distinguish
between those that start with 0x and those which don't. As a
consequence, something like 0x1e+1 is not treated by the Standard as
a series of three tokens, but rather as one invalid token.
I would propose that the rule for token binding be changed so that a
token which starts with "0x" will be terminated by a plus or a minus
sign, even if the last character of the token was an "e" or "E" or,
alternatively, that when a token of the form
0x[0-9a-fA-F]*[eE][+-][0-9]* is encountered it be decomposed into two
or three tokens.
[snip]

I would probably support that idea if it were being defined from
scratch, and it could probably be done without breaking existing valid
code, but I don't think it's worth changing. If I want 0x1e+1 to be
interpreted as three tokens, I'll write 0x1e + 1.

[...]
Post by Jakob Bohm
You miss the point.
Some implementations allow hexadecimal floating point constants.
C11 already added hexadecimal floating point constants as a standard
feature. The syntax uses 'p' rather than 'e' to introduce the exponent,
which is a power of 2. See N1570 6.4.4.2.
Post by Jakob Bohm
So 0x1e+1 is a hexedecimal floating point number with a mantissa of 1
and an exponent of 1.
For decimal floating-point constants, the sign is optional; 1.0e1 and
1.0e+1 are equivalent. For hexadecimal floating-point constants using
'e', it couldn't be, since 0x1e1 already has a meaning.

[...]
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
s***@casperkitty.com
2016-03-30 17:07:09 UTC
Permalink
Raw Message
Post by Keith Thompson
I would probably support that idea if it were being defined from
scratch, and it could probably be done without breaking existing valid
code, but I don't think it's worth changing. If I want 0x1e+1 to be
interpreted as three tokens, I'll write 0x1e + 1.
It's not a hugely important change, but unless there are realistic cases
where it would change the existing of existing code it would seem like
an improvement which should be easy to implement and have relatively little
downside. I'm unaware of any implementations which support hex floating-
point values that don't contain decimal points and use "e" as an exponent
specifier. Are there any?

There are some historical quirks of C which are good, and there are some
which are bad but can't be changed without breaking code (leading zero for
octal was IMHO a horrible choice), but this one strikes me as being bad
and yet easily correctable. Is there any advantage to having the need for
a space after a hexadecimal literal depend upon whether the value in
question is congruent to 14 (mod 16)?
Lew Pitcher
2016-03-30 17:23:47 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
Post by Keith Thompson
I would probably support that idea if it were being defined from
scratch, and it could probably be done without breaking existing valid
code, but I don't think it's worth changing. If I want 0x1e+1 to be
interpreted as three tokens, I'll write 0x1e + 1.
It's not a hugely important change, but unless there are realistic cases
where it would change the existing of existing code
[snip]

You are best to assume that, for any given existing facility in the existing
language, there /are/ realistic cases where a change to that facility will
impact existing code.

Instead of the "No True Scotsman" argument that you seem about to propose, how
about finding a way for your proposal to co-exist with the existing language
facilities?
--
Lew Pitcher
"In Skills, We Trust"
PGP public key available upon request
s***@casperkitty.com
2016-03-30 17:44:03 UTC
Permalink
Raw Message
Post by Lew Pitcher
You are best to assume that, for any given existing facility in the existing
language, there /are/ realistic cases where a change to that facility will
impact existing code.
Can you suggest *any* change to the language which could be made without
at least the theoretical possibility of changing the behavior of some
existing code running on at least some existing implementation?
Post by Lew Pitcher
Instead of the "No True Scotsman" argument that you seem about to propose, how
about finding a way for your proposal to co-exist with the existing language
facilities?
It is probably possible to use token pasting and various other macro
trickery to produce a program in which 0x1e+1 would be processed as a
single token that was further munged by the preprocessor in such a way
as to yield a defined result. I was not intending to denigrate the use
of such a construct if it in fact serves a useful purpose. If, however,
nobody has discovered a useful purpose for such a construct in the 30+
years that it has been available, that would suggest that there is not
much value in the present meaning of 0x1e+1, and some other meaning might
be more useful.
Lew Pitcher
2016-03-30 18:08:11 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
I was not intending to denigrate the use
of such a construct if it in fact serves a useful purpose. If, however,
nobody has discovered a useful purpose for such a construct in the 30+
years that it has been available, that would suggest that there is not
much value in the present meaning of 0x1e+1, and some other meaning might
be more useful.
I suggest that, as the proponent of this specific change to the C language, it
is your responsibility to supply a clear, straight-forward proposal that
includes not only the specifics of the language change you suggest, but also
both a compelling justification of the change, /and/ steps to mitigate the
effects of your proposed change on existing code.

It is /not/ the responsibility of the users or authors of the existing
standard to justify existing features and facilities; it is /your/
responsibility to "show a better way".

Your "request" that others show "a useful purpose" in an existing feature,
(with the implication that the feature has no "useful" purpose, and it is the
burden of others to justify the retention of that feature), shows that you
have not met your responsibilities as the proponent of this change.

Show me exactly
- what your change entails,
- which existing language features it affects,
- why your change is more important than the existing features it conflicts
with, and
- how you propose to mitigate the effects on those who depend on the prior
behaviour
and I'll consider your change. Demand that I/we/someone justify the retention
of a prior feature or behaviour over your proposal, and I'll ignore you.

Just my opinion.
--
Lew Pitcher
"In Skills, We Trust"
PGP public key available upon request
Kaz Kylheku
2016-03-30 19:16:53 UTC
Permalink
Raw Message
Post by Lew Pitcher
Post by s***@casperkitty.com
I was not intending to denigrate the use
of such a construct if it in fact serves a useful purpose. If, however,
nobody has discovered a useful purpose for such a construct in the 30+
years that it has been available, that would suggest that there is not
much value in the present meaning of 0x1e+1, and some other meaning might
be more useful.
I suggest that, as the proponent of this specific change to the C language, it
is your responsibility to supply a clear, straight-forward proposal that
includes not only the specifics of the language change you suggest, but also
both a compelling justification of the change, /and/ steps to mitigate the
effects of your proposed change on existing code.
Easy. I'm going to use the C99 text rather than current draft.

In section 6.4 we have this Constraints:

Each preprocessing token that is converted to a token shall have the
lexical form of a keyword, an identifier, a constant, a string literal,
or a punctuator.

I would change it to:

Each preprocessing token that is converted to one or more tokens shall
have the lexical form of either a single a keyword, an identifier, a
constant, a string literal, or a punctuator, or else of a catenation
of one or more of these tokens.

Under semantics, I would add:

Preprocessor tokens from translation phases 3, 4 and 6 are converted
to tokens in phase 7.. With the exception of pp-tokens, every
preprocessor token converts to exactly one token. A pp-token converts
to one or more tokens, subject to the above Constraints.

And that's pretty much done.

All existing correct code that doesn't violate any constraints or use
any nonconforming implementation extensions to preprocessing or token
syntax stays valid.

The input sequence 0x123E+3 maintains its current status as a single
pp-number through the translation phases where pp-tokens matter.
So no token pasting situations or other preprocessor manipulations
change meaning.

All that changes is that 0x123E+3 doesn't violate a constraint any more
due to not corresponding to any numeric constant. In translation phase
7, it nicely breaks into three tokens, and is then parsed that way.
Kaz Kylheku
2016-03-30 18:50:18 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
Post by Lew Pitcher
You are best to assume that, for any given existing facility in the existing
language, there /are/ realistic cases where a change to that facility will
impact existing code.
Can you suggest *any* change to the language which could be made without
at least the theoretical possibility of changing the behavior of some
existing code running on at least some existing implementation?
Not if you include code which uses nonconforming extensions.

We can extend the language by taking features which were previously
syntax or constraint violations and making them defined.

This won't interfere with any conforming local extension.

However, it can interfere with nonconforming extensions.

E.g. say ISO C suddenly wants to have nested functions, and gives them
different semantics from how they work in GNU C.
Kaz Kylheku
2016-03-30 18:57:52 UTC
Permalink
Raw Message
Post by Keith Thompson
Post by s***@casperkitty.com
Post by Keith Thompson
I would probably support that idea if it were being defined from
scratch, and it could probably be done without breaking existing valid
code, but I don't think it's worth changing. If I want 0x1e+1 to be
interpreted as three tokens, I'll write 0x1e + 1.
It's not a hugely important change, but unless there are realistic cases
where it would change the existing of existing code
[snip]
You are best to assume that, for any given existing facility in the existing
language, there /are/ realistic cases where a change to that facility will
impact existing code.
In this situation, it can be diagnosed. If the rules are changed so that
pp-number does not over-generate the grammar of numeric tokens, but
behaves accurately, implementations could diagnose situations in which
this makes a difference.

A straightforward way to define things would be to keep the
preprocessing grammar as is, but change the rule for how pp-tokens are
converted to tokens in translation phase 7.

Specifically, it could be permitted that a single pp-token can be
lexically scanned into one or more tokens. If such a scan is possible,
and consumes the entire pp-token entirely, then that occurs and there is
no constraint violation.

Thus the pp-number 0x123E+1 would behave exactly as it does now through
the preprocessing stages. Upon translation phase 7, instead of throwing
a constraint violation, it would be delimited into 0x123E, + and 1.

Implementations could easily emit an optional diagnostic whenever a
pp-number is converted in this way into two or more tokens, informing
the programmers that the code isn't portable to older versions of the
language.
Keith Thompson
2016-03-30 17:50:33 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
Post by Keith Thompson
I would probably support that idea if it were being defined from
scratch, and it could probably be done without breaking existing valid
code, but I don't think it's worth changing. If I want 0x1e+1 to be
interpreted as three tokens, I'll write 0x1e + 1.
It's not a hugely important change, but unless there are realistic cases
where it would change the existing of existing code it would seem like
an improvement which should be easy to implement and have relatively little
downside. I'm unaware of any implementations which support hex floating-
point values that don't contain decimal points and use "e" as an exponent
specifier. Are there any?
I hope not. Such implementations would have to resolve the ambiguity of
`0x1e1`. The obvious resolution would be to treat the 'e' as a digit,
but the fact that the ambiguity is there in the first place means it
would create a new way to write code that's difficult to read and
understand. (To be clear, my complaint is about hex FP using 'e' with
no radix point, not with your proposal.)
Post by s***@casperkitty.com
There are some historical quirks of C which are good, and there are some
which are bad but can't be changed without breaking code (leading zero for
octal was IMHO a horrible choice), but this one strikes me as being bad
and yet easily correctable. Is there any advantage to having the need for
a space after a hexadecimal literal depend upon whether the value in
question is congruent to 14 (mod 16)?
I don't object to the proposal -- assuming it can be firmly established
that it doesn't break any existing valid code. I'm not not convinced
it's worth fixing. As far as implementers are concerned, there's no
real problem; the existing rules are unambiguous. As far as progammers
are concerned, there's hardly any problem; anything that's potentially
ambiguous (but actually unambiguous) is better written with spaces
anyway.
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Kaz Kylheku
2016-03-30 18:44:25 UTC
Permalink
Raw Message
Post by Keith Thompson
Post by s***@casperkitty.com
One strange quirk of the C Standard is that the parsing rules which
allow plus and minus signs within numeric tokens don't distinguish
between those that start with 0x and those which don't. As a
consequence, something like 0x1e+1 is not treated by the Standard as
a series of three tokens, but rather as one invalid token.
I would propose that the rule for token binding be changed so that a
token which starts with "0x" will be terminated by a plus or a minus
sign, even if the last character of the token was an "e" or "E" or,
alternatively, that when a token of the form
0x[0-9a-fA-F]*[eE][+-][0-9]* is encountered it be decomposed into two
or three tokens.
[snip]
I would probably support that idea if it were being defined from
scratch, and it could probably be done without breaking existing valid
code, but I don't think it's worth changing. If I want 0x1e+1 to be
interpreted as three tokens, I'll write 0x1e + 1.
Unless you put space around all your tokens, this is just pointless
posturing.

If 0x1E+1 weren't a constraint violation, and silently had some meaning
other than 0x1E + 1, you'd get bitten by it sooner or later.
Keith Thompson
2016-03-30 20:01:17 UTC
Permalink
Raw Message
[...]
Post by Kaz Kylheku
Post by Keith Thompson
I would probably support that idea if it were being defined from
scratch, and it could probably be done without breaking existing valid
code, but I don't think it's worth changing. If I want 0x1e+1 to be
interpreted as three tokens, I'll write 0x1e + 1.
Unless you put space around all your tokens, this is just pointless
posturing.
If 0x1E+1 weren't a constraint violation, and silently had some meaning
other than 0x1E + 1, you'd get bitten by it sooner or later.
But 0x1E+1 is a constraint violation, and doesn't have any meaning at
all. The point of my "posturing" is that the current situation, though
it's perhaps not ideal, is easily dealt with.
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
s***@casperkitty.com
2016-03-30 21:14:14 UTC
Permalink
Raw Message
Post by Keith Thompson
But 0x1E+1 is a constraint violation, and doesn't have any meaning at
all. The point of my "posturing" is that the current situation, though
it's perhaps not ideal, is easily dealt with.
Playing devil's advocate here, is there no combination of token pasting and
other hackery in which a pp-number 0x1e+1 could be processed in such a fashion
as to yield defined behavior? For example, given

#define a0x1e wow
#define foo(x) x
#define join(x,y) x##y

what would be the nature of

foo(join(a,0x1e+1)) ?

I would expect that would emerge from the preprocessor as either
a0x1e+1 or wow+1, but I don't know which. For systems where the
preprocessor outputs a text stream that doesn't add whitespace between
every token, or where the compiler doesn't expect such whitespace, I
see no reason why the compiler wouldn't respond either of those as being
three tokens.
Jakob Bohm
2016-03-30 21:37:31 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
Post by Keith Thompson
But 0x1E+1 is a constraint violation, and doesn't have any meaning at
all. The point of my "posturing" is that the current situation, though
it's perhaps not ideal, is easily dealt with.
Playing devil's advocate here, is there no combination of token pasting and
other hackery in which a pp-number 0x1e+1 could be processed in such a fashion
as to yield defined behavior? For example, given
#define a0x1e wow
#define foo(x) x
#define join(x,y) x##y
what would be the nature of
foo(join(a,0x1e+1)) ?
I would expect that would emerge from the preprocessor as either
a0x1e+1 or wow+1, but I don't know which. For systems where the
preprocessor outputs a text stream that doesn't add whitespace between
every token, or where the compiler doesn't expect such whitespace, I
see no reason why the compiler wouldn't respond either of those as being
three tokens.
You need to realize that there are two kinds of C compilers
(implementations):

Some have a textual preprocessor with textual output then input as text
to a C compiler, like in the old days.

Others do the preprocessing and compilation together, passing internal
token representations from the preprocessor to the compiler. This
second category would have little reason to parse a pp-token again to
see if it was really multiple C-tokens. One such compiler is/was the
Borland C/C++ compiler in the mid-1990s (I have not checked if the
successor to that compiler still does that).

For the first category of C implementation, reparsing into 3 tokens
would be a natural consequence of the C compiler not knowing that the
preprocessor treated a0x1e+1 as a single token. Keeping an unchanged
pp-token definitions for such implementations is very useful when using
that same preprocessor for files that are not C code at all.

For the second category of C implementation, the preprocessor and
compiler will share a single tokenization logic and definition, and
will probably issue an error that a0x1e+1 or 0x1e+1 is a meaningless
token.

A cleanup of this mess will require consideration of both kinds of
implementation.

The closest precedent for this situation is when C89 made 8 and 9 no
longer valid as octal digits.



Enjoy

Jakob
--
Jakob Bohm, CIO, Partner, WiseMo A/S. https://www.wisemo.com
Transformervej 29, 2860 Søborg, Denmark. Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded
Kaz Kylheku
2016-03-30 18:39:20 UTC
Permalink
Raw Message
Post by Jakob Bohm
Post by s***@casperkitty.com
One strange quirk of the C Standard is that the parsing rules which allow
plus and minus signs within numeric tokens don't distinguish between those
that start with 0x and those which don't. As a consequence, something like
0x1e+1 is not treated by the Standard as a series of three tokens, but rather
as one invalid token.
I would propose that the rule for token binding be changed so that a token
which starts with "0x" will be terminated by a plus or a minus sign, even
if the last character of the token was an "e" or "E" or, alternatively, that
when a token of the form 0x[0-9a-fA-F]*[eE][+-][0-9]* is encountered it be
decomposed into two or three tokens. These behaviors would be almost
equivalent, but for some corner cases involving macros which would be unlikely
to arise outside of contrived situations but should be acknowledged if not
specified [to minimize the burden on compilers, I think it would probably be
best to explicitly say that a compiler would be allowed to process something
like
#define paste(x,y) x##y
paste(0x,1e+1)
as though it generated distinct tokens 0x1e + 1, but a compiler would also
be allowed to treat the output of paste as though it were a single token
0x1e+1 which would be invalid for most purposes.]
I can't think of any realistic scenarios where production code might
plausibly rely upon the mandated token-binding behavior, but I'm not as
creative with the preprocessor as some people, so there may be useful
cases I'm not aware of.
You miss the point.
Some implementations allow hexadecimal floating point constants.
Hex floating constants have been in ISO C since C99.

Because E is a hex digit, the characters p and P are used for the
exponent designator:

0x123p1
0x123.Ep+0D

and so on.

For this extension to be possible, it was not necessary for pp-number
to already be generating strings like "0x123P1", and thus reserving
them for future use. (IIRC, the C90 pp-number does cover them, though).

Introducing these hex floating-point constants is possible merely
by virtue of the juxtaposition "0x123P" being a syntax error in
the grammar.

0x123E+1 should not ever be assigned a meaning (that does not coincide
with the meaning of 0x123E + 1).
Keith Thompson
2016-03-30 19:45:21 UTC
Permalink
Raw Message
Kaz Kylheku <545-066-***@kylheku.com> writes:
[...]
Post by Kaz Kylheku
Hex floating constants have been in ISO C since C99.
You're right. (I mistakenly wrote that they were added in C11.)
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Kaz Kylheku
2016-03-30 18:27:23 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
One strange quirk of the C Standard is that the parsing rules which allow
plus and minus signs within numeric tokens don't distinguish between those
that start with 0x and those which don't. As a consequence, something like
A plus or minus sign *cannot* start a 0x token. -0x1 isn't one tken, but
but two: it's a unary minus expression. With the exception of the
exponent in a floating-point field, C does not have signs in numeric
tokens.

Some other languages are different. For instance #x-1234 is a token in
Lisp. The minus sign must go after the #x prefix.
Post by s***@casperkitty.com
I would propose that the rule for token binding be changed so that a token
which starts with "0x" will be terminated by a plus or a minus sign, even
if the last character of the token was an "e" or "E" or, alternatively, that
when a token of the form 0x[0-9a-fA-F]*[eE][+-][0-9]* is encountered it be
decomposed into two or three tokens.
The language *per se* does in fact have accurate grammar rules which
match integer and floating-point constants accurately, excluding
trailing material.

However, the problem is the preprocessing stage. The preprocessing stage
decomposes the input into preprocessing tokens, not into proper language
tokens. All the patterns which match various numeric constants are
analyzed by a single rule which produces a "pp-number" token. This rule
over-generates strings, so that 0x123E+1 is a single pp-number.

After preprocessing, pp-tokens must convert to tokens. Every pp-number
which passes into the syntax analysis stage must exhibit the syntax of
a single numeric constant, in its entirety. A pp-number token like
0x123E+1, whose syntax *de facto* contains three tokens, constitutes
a constraint violation.

For instance see C99 6.4 Constraints "Each preprocessing token that is
converted to a token shall have the lexical form of a keyword, an
identifier, a constant, a string literal, or a punctuator."

While this architecture may seem ridiculous in some ways, it has the
one saving grace in that it effectively reserves some patterns
for future extension.

Your proposal basically requires a pp-number to have an accurate lexical
grammar which is a conjunction of the grammars of the actual numeric
constants recognized at translation phase 7, so that the preprocessor
decomposes 0x123E+1 into the tokens 0x123E + 1: a pp-number,
a punctuator and another pp-number.
s***@casperkitty.com
2016-03-30 18:48:10 UTC
Permalink
Raw Message
Post by Kaz Kylheku
A plus or minus sign *cannot* start a 0x token. -0x1 isn't one tken, but
but two: it's a unary minus expression. With the exception of the
exponent in a floating-point field, C does not have signs in numeric
tokens.
What are +, ++, +=, -, --, -=, and -> if not tokens?
Post by Kaz Kylheku
While this architecture may seem ridiculous in some ways, it has the
one saving grace in that it effectively reserves some patterns
for future extension.
What kind of extension could you envision that would be more useful than
having 0x1e-1 be interpreted as 0x1e - 1 ?
Post by Kaz Kylheku
Your proposal basically requires a pp-number to have an accurate lexical
grammar which is a conjunction of the grammars of the actual numeric
constants recognized at translation phase 7, so that the preprocessor
decomposes 0x123E+1 into the tokens 0x123E + 1: a pp-number,
a punctuator and another pp-number.
My intention was that an implementation could, at its leisure, either
use a different definition of pp-number which would exclude things that
contain an "x" and a minus sign without an intervening "p", or interpret
a pp-number which contained an "x" and a minus sign without an intervening
"p" as a sequence of tokens, whichever happened to be easier.
Keith Thompson
2016-03-30 19:44:21 UTC
Permalink
Raw Message
Post by s***@casperkitty.com
Post by Kaz Kylheku
A plus or minus sign *cannot* start a 0x token. -0x1 isn't one tken, but
but two: it's a unary minus expression. With the exception of the
exponent in a floating-point field, C does not have signs in numeric
tokens.
What are +, ++, +=, -, --, -=, and -> if not tokens?
Certainly they're tokens. They're not numeric tokens.

I believe Kaz is saying that no integer-constant or floating-constant
starts with a "+" or "-" character. Which is accurate, but since you
never said otherwise (you were referring, I believe, only to the "+" or
"-" that can be part of the exponent-part or binary-exponent-part of a
floating-constant), I'm not entirely sure why he mentioned it.

[...]
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Loading...