(too old to reply)
double cpp expansion?
s***@amu.edu.pl
2004-02-27 15:31:55 UTC
#define A(x) expandA()+B(x)
#define B expandB+A
B(Barg)
A(Aarg)

my preprocessors give:

expandB+ expandA()+ expandB+A ( Barg )
expandA()+ expandB+A ( Aarg )

In the first case `B' macro is expanded twice, unlike
the second case, where each macro is expanded once.

With a little simplified definitions:
#define A() B()
#define B A
my solution is:

B ( ) //replace "B"

A ( ) //initial rescan leaves "A" as is
|B|
// "(" found after "A" - replace "A" in
// context of "B" - nested expansion

B ( ) // start nested rescan
| A |
| B |

B'( ) // rescan finds B in "B" context - paints token,
| A | // nothing else changes
| B |

B() // result

Could you please check my expansion and tell me
what error I do, if any? Thank you.
--
S.Tobias
Douglas A. Gwyn
2004-02-27 21:28:16 UTC
I'm not sure what bug your preprocessor has. During the
expansion of B in B(Barg) there is an expansion of A(Barg)
that should not further recurse on B since B is still
"being replaced" (most readily implemented as a flag in
the symbol table). The nested B resulting from expanding
A(Barg) should be flagged ("painted blue", another flag in
the symbol table) and never replaced. The result of
expanding B(Barg) should thus be "expB+expA()+B(Barg)".
If you have the source code for the preprocessor you could
try "instrumenting" it (adding carefully placed printouts)
to watch how it is making its decisions while expanding
your test code.
s***@amu.edu.pl
2004-02-28 01:05:46 UTC
Post by Douglas A. Gwyn
the symbol table) and never replaced. The result of
expanding B(Barg) should thus be "expB+expA()+B(Barg)".
Thank you a lot! You have reassured me.

Compilers/preprocessors I have tested were:
latest gcc, latest como, msvc7.1, Digital unix cc (probably old)
and latest mcpp - all of them expand to expB+expA()+expB+A(Barg)

Did the standard change, ambiguities, or nobody (few?) did it
the right way? Is there at least one exemplary implementation?
Post by Douglas A. Gwyn
If you have the source code for the preprocessor you could
try "instrumenting" it (adding carefully placed printouts)
I've looked into gcc's cpp code, sounds like a little too much
for me just now. But mcpp (announced these days) has an interesting
feature #pragma __debug_cpp __expand.

I think I'll start sending bug reports.
Thanks again!
--
S.Tobias
Paul Mensonides
2004-02-28 03:19:20 UTC
Post by Douglas A. Gwyn
I'm not sure what bug your preprocessor has. During the
expansion of B in B(Barg) there is an expansion of A(Barg)
that should not further recurse on B since B is still
"being replaced" (most readily implemented as a flag in
the symbol table).
This is incorrect. When a macro invocation spans the end of a replacement list
it is not necessarily considered nested. The traditional approach is to
consider it *not* nested, and a ton of existing code requires that behavior.
For this reason and the wording of the standard, virtually every major
preprocessor implementation has this behavior--despite the non-normative note in
the appendix that says it is unspecified whether it is nested or not.

The viewpoint that macro expansion forms an invocation hierarchy is faulty which
leads to interpretations like this one. Macro expansion is conceptually
(regardless of a particular implementation strategy) an "in place", iterative
operation. As such, the expansion procedes as follows (ignoring argument
expansion because there is none involved):

#define A(x) expandA()+B(x)
#define B expandB+A

B(Barg)

B (Barg)
| ^ |
|___|
|
B invocation

expandB+A (Barg)
|^ |
|_________|
|
B context

expandB+ A (Barg)
| |^| |
| |_|______|
|__________| |
| A invocation (#1)
B context

expandB+ expandA()+B(Barg)
|^ |
|_________________|
|
A context

expandB+ expandA()+ B (Barg)
| | ^ | |
| |___| |
| | |
| B invocation (#2)
|_____________________|
|
A context

expandB+ expandA()+ expandB+A (Barg)
| |^ | |
| |_________| |
| | |
| B context |
|___________________________|
|
A context

expandB+ expandA()+ expandB+A' (Barg)
| | ^ | | (#3)
| |__________| |
| | |
| B context |
|____________________________|
|
A context

Note in particular the points #1, #2, and #3. The invocation of A at point #1
is not considered nested within the disabling context established by the
replacement list of B. At point #2, B's invocation *is* nested within the
context established by the replacement list of A. Hence, when the A
preprocessing token is found at point #3, the A-disabling context is still
active, and the token is painted (i.e. the apostrophe). The second example
procedes as follows:

// #define A(x) expandA()+B(x)
// #define B expandB+A

A(Aarg)

A(Aarg)
|^ |
|_______|
|
A expansion

expandA()+B(Aarg)
|^ |
|_________________|
|
A context

expandA()+ B (Aarg)
| | ^ | |
| |___| |
| | |
| B invocation
|_____________________|
|
A context

expandA()+ expandB+A (Aarg)
| |^ | |
| |_________| |
| | |
| B context |
|___________________________|
|
A context

expandA()+ expandB+A' (Aarg)
| | ^ | |
| |__________| |
| | |
| B context |
|____________________________|
|
A context

Regards,
Paul Mensonides
Douglas A. Gwyn
2004-02-28 07:24:59 UTC
Post by Paul Mensonides
This is incorrect. When a macro invocation spans the end of a replacement list
it is not necessarily considered nested.
The wording in the C standard makes it pretty clear that
the inner replacement occurs *during* the process of the
original replacement (thus setting the condition for blue
paint), and that it is only whether a macro *name* is
seen within a replacement buffer that determines the
onset of a nested replacement (which can involve argument
tokens beyond the span of the higher-level macro-plus-
arguments).
Post by Paul Mensonides
a ton of existing code requires that behavior.
Really? I find that surprising.
Post by Paul Mensonides
The viewpoint that macro expansion forms an invocation hierarchy is faulty which
leads to interpretations like this one.
The specification is explicitly recursive.
Post by Paul Mensonides
expandB+ A (Barg)
| |^| |
| |_|______|
|__________| |
| A invocation (#1)
B context
expandB+ expandA()+B(Barg)
|^ |
|_________________|
|
A context
No, you dropped a context there. The original expansion
of B has not yet concluded, because part of the process
requires expansion of a nested macro A, which has not
quite concluded...
Post by Paul Mensonides
expandB+ expandA()+ B (Barg)
| | ^ | |
| |___| |
| | |
| B invocation (#2)
|_____________________|
|
A context
...until the A-replacement (sub)buffer is examined for
further nested macros to replace, *during* which
examination an occurrence of B (or of A) shall not be
replaced. The specification practically tracks this
very example step by step, so it is easy to apply.

Note that the process described in 6.10.3.4 is one
*component* of the process 6.10.3, not something that
happens after 6.10.3 is complete. In fact the reference
to nested macros encountering the name being replaced
makes sense only with that understanding. I think what
may be throwing you is the use of the term "nested",
which you might think means geographically nested instead
of procedurally nested. Unfortunately the term is used
colloquially with no further elucidation; however, there
is no clue that it should be thought of as geographic
nesting, and in its immediate context the only already
defined nesting process it could be referring to is the
logical recursive expansion.

The only intentional ambiguity in this area is the one
referred to in the response to DR #017 (Question 9),
which refers to the situation only after *all* expansion
is complete, and a different situation than in this
example. Further, the specification makes it very clear
that blue paint is permanent, so there is no way for the
newly created "B" to ever trigger macro replacement.
Paul Mensonides
2004-02-28 09:03:37 UTC
Post by Douglas A. Gwyn
Post by Paul Mensonides
This is incorrect. When a macro invocation spans the end of a replacement list
it is not necessarily considered nested.
The wording in the C standard makes it pretty clear that
the inner replacement occurs *during* the process of the
original replacement (thus setting the condition for blue
paint),
The wording of the C standard makes it clear that the rest of the preprocessing
tokens in the file are *not* included. "If the name of the macro being replaced
is found during this scan of the replacement list (not including the rest of the
file's preprocessing tokens), it is not replaced. Furthermore, if any nested
replacements encounter the name of the macro being replaced it is not replaced."
Post by Douglas A. Gwyn
Post by Paul Mensonides
a ton of existing code requires that behavior.
Really? I find that surprising.
Really. The majority of it is in C++, but it exists in C as well.
Post by Douglas A. Gwyn
Post by Paul Mensonides
The viewpoint that macro expansion forms an invocation hierarchy is faulty which
leads to interpretations like this one.
The specification is explicitly recursive.
No, it is explicitly non-recursive (see below).
Post by Douglas A. Gwyn
Post by Paul Mensonides
expandB+ A (Barg)
| |^| |
| |_|______|
|__________| |
| A invocation (#1)
B context
expandB+ expandA()+B(Barg)
|^ |
|_________________|
|
A context
No, you dropped a context there. The original expansion
of B has not yet concluded, because part of the process
requires expansion of a nested macro A, which has not
quite concluded...
No, I didn't. The context no longer exists because the invocation is not nested
(see below). You are blatantly ignoring what the standard dictates because of a
typical view that macro expansion is like a procedural hierarchy. Yes, scanning
is not finished with the preprocessing tokens of B's replacement list, but is
finished with the *replacement* of the invocation of B. At this point, nothing
exists except contexts that produced the preprocessing tokens. Specifically,
there is no context that implicitly exists because of logical flow implied by
functional hierarchy. In the conceptual model defined by the standard, there is
no "token buffer". There is simply a stream of preprocessing tokens that makes
up the entire source file. Some of those preprocessing tokens get replaced by
replacement lists (followed by argument substitution and operator evaluation)
and then scanning resumes at the beginning of the replacement list with some
conceptual delimiter at the end of the tokens that made up the replacement list
that marks the point where the macro name becomes available again (i.e. will no
longer get painted). To illustrate...

#define A() B()
#define B() C()
#define C() 1

tok tok A() tok tok

...does not get processed as:

tok tok A() tok tok
|
B()
|
C()
|
1

Rather, it gets processed as:

tok tok A() tok tok
^
tok tok B() tok tok
|^ |
|___|
|
A context

tok tok C() tok tok
||^ ||
||___||
| | |
| B context
|_____|
|
A context

tok tok 1 tok tok
||| ^ |||
|||___|||
|| | ||
|| C context
||_____||
| | |
| B context
|_______|
|
A context

tok tok 1 tok tok
^

I'll try to be more clear. The preprocessing tokens before and (more
importantly) after any given invocation are always right there with the
invocation. They don't have to be "gotten" from some outer token stream when
needed--they are _already_ there in one steady stream of preprocessing tokens
that makes up all the preceding and trailing tokens to the end of the file.
(The only exception to this is argument expansion which occurs as an independent
stream of preprocessing tokens that acts exactly the same as the scheme above
except that the stream that makes up the argument's preprocessing tokens acts as
if it was the entire file.) In the above example, the relevant stream of tokens
is "tok tok A() tok tok". The A() invocation is replaced by A's replacement
list, yielding "tok tok B() tok tok" and so on. This is quite different than
the call-and-return model that constructs a functional hierarchy. In that
model, the invocation of A would get expanded all the way down to 1 which would
then recursively unroll, ultimately replacing the invocation tokens "A()" with
"1". That is not what the standard specifies. It specifies replacement first,
then rescanning, not rescanning followed by replacement. To put it in more core
language terms, it as if every invocation is physically inlined into the stream
of preprocessing tokens where the invocation existed (which is either the
top-level stream or an argument being processed as an separate stream).
Post by Douglas A. Gwyn
Post by Paul Mensonides
expandB+ expandA()+ B (Barg)
| | ^ | |
| |___| |
| | |
| B invocation (#2)
|_____________________|
|
A context
...until the A-replacement (sub)buffer is examined for
further nested macros to replace, *during* which
examination an occurrence of B (or of A) shall not be
replaced. The specification practically tracks this
very example step by step, so it is easy to apply.
No it doesn't. The invocation of B at this point is nested within the
rescanning of A's replacement list, but not B's.
Post by Douglas A. Gwyn
Note that the process described in 6.10.3.4 is one
*component* of the process 6.10.3, not something that
happens after 6.10.3 is complete.
Actually, it is the resumption of the iterative process:

(re)scanning -> replacement -> substitution -> #/## ->
^ |
|____________________________________________________|
Post by Douglas A. Gwyn
In fact the reference
to nested macros encountering the name being replaced
makes sense only with that understanding. I think what
may be throwing you is the use of the term "nested",
which you might think means geographically nested instead
of procedurally nested.
This is the exact model defined by the standard: invocation -> replacement ->
argument substitution -> token-pasting/stringizing -> rescanning. This is a
critical point: replacement happens in a distinct step prior to rescanning
(emphasis mine):

6.10.3/9 - "...that causes each subsequent instance of the macro name to be
***replaced by the replacement list of preprocessing tokens that constitute the
remainder of the directive***."

6.10.3/10 - "...that is ***replaced by the replacement list*** in the
definition..."

A macro invocation is replaced by the preprocessing tokens of the replacement
list, *not* by the rescanned (i.e. macro-expanded) preprocessing tokens of the
replacement list. Conceptually, this step happens even before argument
substitution. It is in-place iteration over the sequence of preprocessing
tokens that makes up a file. There is no procedural nesting, there is only
physical/geographical nesting designed to prevent macro expansion from looping
forever. The only occurrence of recursion in this model is that arguments to
macros are completely macro replaced separately. I.e. this can be considered
recursively processed:

#define ID(x) x

ID(ID(ID(1)))

but this is iteratively processed:

#define A(x) B(x)
#define B(x) C(x)
#define C(x) x

A(1)
Post by Douglas A. Gwyn
Unfortunately the term is used
colloquially with no further elucidation; however, there
is no clue that it should be thought of as geographic
nesting, and in its immediate context the only already
defined nesting process it could be referring to is the
logical recursive expansion.
The immediately preceding sentence says "not including the rest of the source
file's preprocessing tokens". That gives plenty of immediate context. "Logical
recursive expansion" is an assumption based on typical programming models, not
on the definition of macro expansion in the standard. That assumption
introduces a different conceptual model that happens to produce the same results
in all cases *except* when the invocation of a function-like macro spans the end
of a replacement list. At that point, the faulty, but familiar, conceptual
model derived from the assumption breaks, and you have to look at the actual
conceptual model defined by the standard--which is the antithesis of recursive
nesting.
Post by Douglas A. Gwyn
The only intentional ambiguity in this area is the one
referred to in the response to DR #017 (Question 9),
which refers to the situation only after *all* expansion
is complete, and a different situation than in this
example. Further, the specification makes it very clear
that blue paint is permanent, so there is no way for the
newly created "B" to ever trigger macro replacement.
Absolutely, blue paint is permanent--except of course if you create a new
preprocessing token via token-pasting (the paint is not transferred because the
token no longer exists):

#define CAT(a, b) PRIMITIVE_CAT(a, b)
#define PRIMITIVE_CAT(a, b) a ## b

#define A() A
#define AA A

CAT(A(), A()) // A (i.e. not painted)
CAT(A(), A())() // A' (i.e. painted)

;)

Regards,
Paul Mensonides
Paul Mensonides
2004-02-28 09:26:01 UTC
FYI, here is a link which discusses the original algorithm and intent from which
the standard text was derived:
http://gcc.gnu.org/ml/gcc-prs/2001-q1/msg00495.html

Note that the algorithm is not identical to the text of the standard. Rather
the text of the standard is simplified by breaking the procedural hierarchy mold
by removing the recursive model. Nevertheless, it is illustrative of the
original intent of the text of the standard.

Regards,
Paul Mensonides
s***@amu.edu.pl
2004-02-28 13:13:10 UTC
Let's first make meaning of certain words clear, to make sure
we understand each other, for the Standard uses them without
formal definition.
"macro name" = identifier after "#define" directive (6.10.3#9 and #10)
"macro invocation" = 6.10.3#10 uses it twice, first is ambiguous, second
use strictly determines this is a *noun* meaning "sequence of preprocessing
tokens" (not a verb meaning "process of collecting tokens"). I'll
extend its use for object-like macros (the sequence consists of one
token, ie. macro name).
"macro" = can't find any definition for the _word_, I assume this is
alias of "macro invocation" ("macro-name" for obj-like macros; and
"macro-name ( ... )" for fn-like macros, otherwise "macro-name othertoken"
is not a "macro", and is not considered for expansion)
"macro replacement" = verb, denoting process or replacing one sequence
of tokens by another
Post by Paul Mensonides
Post by Douglas A. Gwyn
Post by Paul Mensonides
This is incorrect. When a macro invocation spans the end of a replacement
list
Post by Douglas A. Gwyn
Post by Paul Mensonides
it is not necessarily considered nested.
The wording in the C standard makes it pretty clear that
the inner replacement occurs *during* the process of the
original replacement (thus setting the condition for blue
paint),
The wording of the C standard makes it clear that the rest of the preprocessing
tokens in the file are *not* included. "If the name of the macro being replaced
is found during this scan of the replacement list (not including the rest of the
file's preprocessing tokens), it is not replaced. Furthermore, if any nested
replacements encounter the name of the macro being replaced it is not replaced."
This is a little OT here, because the discussion is about what
"nested" means here. But this excerpt is vital here, so I'll rewrite it
here with my comments:

6.10.3.4
#1 [...]
Then, the resulting preprocessing token sequence is rescanned,
along with all subsequent preprocessing tokens of the source file
//this inclusion of the rest of the file makes sense only
//for fn-like macros, either to determine whether fn-like
//"macro-name" forms a macro (with subsequent lparen), or to complete
//fn-like macro argument list.
, for more macro names to replace.
//actually "macros" are replaced, not "macro names", this is
//probably a human language deficiency, but also maybe direct
//reference to next point, see futher
#2
If the name of the macro being replaced
//note: "name of the macro", not "macro" or "macro invocation";
//the "macro" is replaced, not "macro name"
is found during this scan
//ie. "rescan"
of the replacement list
//which resulted in previous replacement before "rescan"
(not including the rest of the source file's preprocessing tokens),
//this is redundant, it only adds emphasis to "the replacement list",
//ie. this and no further
it is not replaced.
Further, if any nested
//"nested" means replacements that occur during this rescan,
//cf. #1
replacements encounter the name of the macro
//again: "name of the macro", not "macro invocation"
being replaced, it
//"it" surely refers to "macro", not "name of the macro"
is not replaced. These nonreplaced [...]

Nowhere in this paragraph "Rescanning..." is the word "invocation"
even mentioned. Rescanning and invocation (verb) are two separate
procedures.
Post by Paul Mensonides
Post by Douglas A. Gwyn
Post by Paul Mensonides
The viewpoint that macro expansion forms an invocation hierarchy is faulty
which
Post by Douglas A. Gwyn
Post by Paul Mensonides
leads to interpretations like this one.
The specification is explicitly recursive.
No, it is explicitly non-recursive (see below).
The "macro expansion" algorithm may be not necessarily recursive,
but literaly the Standard _is_ recursive: rescanning and parameter
replacements make direct references to expantion (earlier in the
Standard), which possibly contains nested param expansions and is
followed by rescan, which ... etc.
Sometimes macro expansions are independent (parameter) and sometimes
they are performed in context of subsequent tokens (in file, or
nested parameter expantions within parameter expansions).
Post by Paul Mensonides
Post by Douglas A. Gwyn
Post by Paul Mensonides
expandB+ A (Barg)
| |^| |
| |_|______|
|__________| |
| A invocation (#1)
B context
expandB+ expandA()+B(Barg)
|^ |
|_________________|
|
A context
No, you dropped a context there. The original expansion
of B has not yet concluded, because part of the process
requires expansion of a nested macro A, which has not
quite concluded...
No, I didn't. The context no longer exists because the invocation is not nested
This is the crux of the problem: you say "invocation is not nested",
which obviously isn't, but the Standard doesn't ever mention "invocation"
during "rescan", but merely "nested replacement".

Here "macro name" A falls within the replacement procedure of B,
next we determine that A forms an invocation, thus we expand it
(ie. macro, but maybe this is why "macro name" is used in #1 - see above),
this replacement takes place within the context of B expansion,
because token A originated from there.
Post by Paul Mensonides
(see below). You are blatantly ignoring what the standard dictates because of a
typical view that macro expansion is like a procedural hierarchy. Yes, scanning
I guess noone here denies it, see further
Post by Paul Mensonides
tok tok 1 tok tok
||| ^ |||
|||___|||
|| | ||
|| C context
||_____||
| | |
| B context
|_______|
|
A context
tok tok 1 tok tok
^
Fully agreed.
Post by Paul Mensonides
Post by Douglas A. Gwyn
Unfortunately the term is used
colloquially with no further elucidation; however, there
is no clue that it should be thought of as geographic
nesting, and in its immediate context the only already
defined nesting process it could be referring to is the
logical recursive expansion.
I think what Douglas meant by "geographic nesting" is "syntactical
nesting" (as in C syntax), which is not taking place in macro expansion;
and by "procedurally nested" (earlier) he means 'replacement procedure'.
I think we all agree here.

Paul:
When I started the editor, my intention was to agree with you, and to back
up your argument I wanted to bring that excerpt where it says "nested
invocation", but I din't find it. If the Standard said in 6.10.3.4#2
"if any nested replacements encounter the invocation of the macro
being replaced", I would definitely say you were right, because
in the above example "A invocation" is not nested; but it doesn't.
Please, don't use the word "invocation" in context of "rescan",
because the Standard doesn't do so, and that has already confused
me a lot. IMO the use of "invocation" is only valid in terms of
"macro replacement", ie. to determine whether function-like macro-name
(token) forms a "macro" ("macro invocation").

I have taken notice of the link you gave in the followup to your
article (I'll yet have to read it more closely later). The expansion
of G_0(42) definitely disagrees with my reasoning and, what I think,
what the Standard says. Okay, maybe the intention was different,
by Dave Prosser's explanations; then maybe the Standard has a flaw
there and needs correcting. Only what do we do until then?
--
Stan Tobias
Douglas A. Gwyn
2004-02-28 17:10:45 UTC
Post by Paul Mensonides
Post by Douglas A. Gwyn
The wording in the C standard makes it pretty clear that
the inner replacement occurs *during* the process of the
original replacement (thus setting the condition for blue
paint),
The wording of the C standard makes it clear that the rest of the preprocessing
tokens in the file are *not* included. "If the name of the macro being replaced
is found during this scan of the replacement list (not including the rest of the
file's preprocessing tokens), it is not replaced. Furthermore, if any nested
replacements encounter the name of the macro being replaced it is not replaced."
You left out the preceding paragraph, which sets the
context:
"... Then, the resulting preprocessing token
sequence is rescanned, along with all subsequent
preprocessing tokens of the source file, for
more macro names to replace."

Then the first sentence of the part you quoted is not
applicable to the inner "B" in our example, because it
does not exist yet during "this" scan (the scan of the
immediate consequence of tne parameter substitution
and #,## processing, looking for macro names).
However, the second sentence ("Furthermore, ..." does
apply to the name "B" that results from the nested "A"
replacement (which recursively involves its own scan
for further macro names).
Post by Paul Mensonides
No, I didn't. The context no longer exists because the invocation is not nested
(see below). You are blatantly ignoring what the standard dictates because of a
typical view that macro expansion is like a procedural hierarchy. Yes, scanning
is not finished with the preprocessing tokens of B's replacement list, but is
finished with the *replacement* of the invocation of B.
No, it is not. The subclauses of 6.10.3 all are part
of the process of macro replacement described by clause
6.10.3. That includes 6.10.3.4 (rescanning and further
replacement).

There are ways for implementations to take shortcuts,
as it it well known that tail recursion can be replaced
by iteration. But the specification does not refer to
any such shortcuts, and if an implementation loses a
context due to being in too big a hurry to move on, it
is simply a defective implementation.

If we were to take you view to its logical extreme,
then there would be no possibility of an in-process
macro name appearing within a nested expansion (past
the first level). That is contradicted by the standard
specifying special consequences for cases where that
*does* occur.
Post by Paul Mensonides
I'll try to be more clear. The preprocessing tokens before and (more
importantly) after any given invocation are always right there with the
invocation. They don't have to be "gotten" from some outer token stream when
needed--they are _already_ there in one steady stream of preprocessing tokens
that makes up all the preceding and trailing tokens to the end of the file.
However, the standard specifies when they are examined and/pr
replaced and when they are not examined and/or replaced.
When I say colloquially that tokens are "fetched", I mean
merely that they are available for examination and possible
replacement.
Post by Paul Mensonides
.. That is not what the standard specifies. It specifies replacement first,
then rescanning, not rescanning followed by replacement.
I am not at all confused about the sequencing. The
process of macro replacement, starting just after
an identifier has been recognized as a defined macro
name during the appropriate phase of processing,
involves temporarily flagging that global identifer
as "in the process of being replaced", locating the
tokens for the macro arguments and fully macro-replacing
each of *them* [recursive subprocessing], substituting
each fully-expanded argument for the corresponding
parameter in the definition for the current macro,
concurrent with #,## processing, then rescanning the
result of the previous operation, looking for
identifiers corresponding to defined macros, and for
each such name applying the following processing:
if (global identifier is "in the process of
being expanded") then apply permanent blue
paint to that identfier pp token (not to
the global identifier, as I might have
mistakenly indicated in a previous posting).
if (identifier token has ever had blue paint
applied) then leave it intact;
else (identifier is a defined macro name,
and has not been painted blue) so
begin a macro replacement process for
that identifier, and if it is a
function-like macro, it is allowed to
access remaining pp tokens (i.e. those
not involved in any nesting replacement
process from expansion of macros that
ay still be in process) when fetching
its arguments.
*After* each such name (we're now back to the original
replacement buffer) has been fully macro-replaced, the
current macro replacement process (6.10.3) is complete
and the "in the process of being replaced" flag is
removed from the global identifier. The context now
pops back to whatever scanning was being done (top
level or some nested macro replacement).
Post by Paul Mensonides
... it as if every invocation is physically inlined into the stream
of preprocessing tokens where the invocation existed (which is either the
top-level stream or an argument being processed as an separate stream).
That is merely what "substitution" consists of, and
has no deeper meaning.
Post by Paul Mensonides
No it doesn't. The invocation of B at this point is nested within the
rescanning of A's replacement list, but not B's.
That makes no sense if "nesting" has its logical
(procedural) meaning rather than a geographical
interpretation. Repeating what I said above, that
would mean that there would be no possibility of an
in-process macro name appearing within a nested expansion
past the first level, which is contradicted by the
standard discussing exactly that case.
Post by Paul Mensonides
Post by Douglas A. Gwyn
Note that the process described in 6.10.3.4 is one
*component* of the process 6.10.3, not something that
happens after 6.10.3 is complete.
(re)scanning -> replacement -> substitution -> #/## ->
^ |
|____________________________________________________|
That is contrary to all similar constructions in the
standard, such as what constitutes the referred-to
operand in a subexpression, and is clearly contradicted
by the explicitly recursive wording of the specification,
and the explicit reference to more than one level of
nesting. (Your model involves only one level.) Note
also that macro replacement within each argument to a
macro is necessarily performed as if 6.10.3 is invoked
as a subroutine.
Post by Paul Mensonides
6.10.3/9 - "...that causes each subsequent instance of the macro name to be
***replaced by the replacement list of preprocessing tokens that constitute the
remainder of the directive***."
6.10.3/10 - "...that is ***replaced by the replacement list*** in the
definition..."
Yes, that explains the eventual purpose of the "body"
of a macro *definition*. The actual replacement
process does not begin at that point, but upon
subsequent recognition of a defined identifier during
the top-level pass across the input to translation
phase 4, and also upon recognition of defined
identifiers during collection of macro arguments, and
also during recognition of defined identifiers during
the rescan phase of *each* macro expansion process,
except when the identifier pp token has had blue paint
applied.
Post by Paul Mensonides
A macro invocation is replaced by the preprocessing tokens of the replacement
list, *not* by the rescanned (i.e. macro-expanded) preprocessing tokens of the
replacement list. Conceptually, this step happens even before argument
substitution.
No, not at all. The complete replacement can involve
much more than what was contained explicitly within
the body of the definition. That is why we spell out
the process in detail.
Post by Paul Mensonides
... There is no procedural nesting, there is only
physical/geographical nesting designed to prevent macro expansion from looping
forever.
Actually, "looping forever" is a logical/procedural
notion, not a geographic one.
Post by Paul Mensonides
Post by Douglas A. Gwyn
... Further, the specification makes it very clear
that blue paint is permanent, so there is no way for the
newly created "B" to ever trigger macro replacement.
Absolutely, blue paint is permanent--
But since the inner "B" was not *geographically* nested,
by your interpretation you'd have to say that it wasn't
painted blue.

I still think this whole matter boils down to whether
"nested" means procedurally or geographically. I know
that we recently discussed this in a WG14 meeting, but
I seem to recall that it was during an informal session
and that hope was expressed that nobody would file a DR
asking for a clarification. If as you have said there
is a "ton" of existing software that *relies* on that
detail, then it might even be unwise to ask for an
official ruling, which is likely to be as I have
explained, forcing implementations that have done it
wrong to make a choice between conformance or making
existing bogus code continue to do what the programmer
expected. (More likely, another option would be added
to the compiler, and mentioned in the conformance
section of the documentation.)

It would be good to advise programmers not to rely on
your model (the geographic nesting one) nor on my model
(the procedural nesting one) since implementations may
differ on this score.
Douglas A. Gwyn
2004-02-28 17:21:51 UTC
Post by Paul Mensonides
FYI, here is a link which discusses the original algorithm and intent from which
http://gcc.gnu.org/ml/gcc-prs/2001-q1/msg00495.html
Note that the algorithm is not identical to the text of the standard. Rather
the text of the standard is simplified by breaking the procedural hierarchy mold
by removing the recursive model. Nevertheless, it is illustrative of the
original intent of the text of the standard.
Actually I was one of the people who worked on the wording
for the standard, and I remember Dave's proposed model.
What I don't remember is the committee agreeing to the
intersection-of-hidesets approach. (Note that despite
what you said in your other note, it is explicitly a
rescursive model.) What we agreed to is the text that
you see printed in the standard. There is a tiny amount
of unspecified behavior (referred to in the previously
sited DR), but that doesn't apply to the current example,
and even less so to the one in the link you provided.
Douglas A. Gwyn
2004-02-28 17:27:10 UTC
Post by s***@amu.edu.pl
Only what do we do until then?
My suggestion is to not use macro definitions
that are meant to snarf additional PP tokens
from what follows *if* a potential recursion
could occur. I suppose the most likely instance
would be using an *object-like* macro to replace
the *name* of a function, except that the name
is really that of a function-like macro, whose
arguments might accidentally contain calls to
the first "function", wherein lies the recursion.
My answer to that is, don't do it. If you want
to use a macro to replace a function, replace
the argument list as well (by using a function-
like macro instead of an object-like macro).
Douglas A. Gwyn
2004-02-28 17:34:29 UTC
Apologies for the typos,
which I don't think obscured my point(s).
s***@amu.edu.pl
2004-02-28 22:57:06 UTC
Post by Douglas A. Gwyn
I still think this whole matter boils down to whether
"nested" means procedurally or geographically. I know
So as I understand, this issue is not definitely solved.
Although I think that Stadard talks about "procedural nesting",
most preprocessors/compilers seem to think different.
There's nothing to rely on then.

+++++++++++

Now different, but related problem:

Is argument expansion (not argument substitution) part of fn-like
macro expansion, or does it precede the actual expansion?
6.10.3.1#1 is rather vague on it:
"Before being substituted, each argument's preprocessing
tokens are completely macro replaced [...]"

If, as you say, "nesting" is procedural (to which I agree), and
argument substitution is part of macro replacement (which IMO is
a sane assumption), then it would mean, that the arguments are
expanded in same context as the macro.

#define A() B
#define B(x) C(x)

A()(A())
//found A(), replace
B (A())
|A|
//rescan finds B(...), in context of A,
//does replacement; argument A() is expanded first;
//since we assume that this is part of B replacement,
//A is disabled, thus A in the arg gets painted: A'()
C(A'())
| B |
| A | //futher rescans don't change anything

C(A()) //final result

Am I correct? If I am, then this would mean that for fn-like macros
the context is extended in "geographical" sense on all its arguments.

----

IMHO there is a minor glitch in the Standard in wording:
"[...] each argument's preprocessing tokens are completely macro
replaced as if they formed the rest of the preprocessing file;"
"as if they formed the rest of the file" is only in sense of
available tokens; arguments may be replaced in expansion of
a macro during a rescan, which means that some macros may be
disabled and "painting" rules apply - which is not quite
the same "as if they formed ...".
--
Stan Tobias
Douglas A. Gwyn
2004-02-29 06:17:37 UTC
Post by s***@amu.edu.pl
Is argument expansion (not argument substitution) part of fn-like
macro expansion, or does it precede the actual expansion?
"Before being substituted, each argument's preprocessing
tokens are completely macro replaced [...]"
Where the [...] limits the pp token stream context to just
that bounded by the original argument.
Post by s***@amu.edu.pl
If, as you say, "nesting" is procedural (to which I agree), and
argument substitution is part of macro replacement (which IMO is
a sane assumption), then it would mean, that the arguments are
expanded in same context as the macro.
Not really; it is as if
phase 5 input := result of phase 4(translation unit);
phase 4(token stream):
...
if identifier has macro def,
macro replace;
...
macro replace:
locate arguments;
for each argument,
argument := phase 4(argument);
... rest of current macro replacement
(with a few details omitted so as not to obscure the important
point)
Post by s***@amu.edu.pl
#define A() B
#define B(x) C(x)
A()(A())
//found A(), replace
B (A())
|A|
//rescan finds B(...), in context of A,
//does replacement; argument A() is expanded first;
//since we assume that this is part of B replacement,
//A is disabled, thus A in the arg gets painted: A'()
Yes, it is a nested replacement and the name A is seen
while the higher-level A is in the process of being
replaced, so that inner occurrence of A is painted blue.
Post by s***@amu.edu.pl
C(A()) //final result
Seems right.
Post by s***@amu.edu.pl
Am I correct? If I am, then this would mean that for fn-like macros
the context is extended in "geographical" sense on all its arguments.
The only geographical restriction is that during argument
expansion, no additional pp-tokens can be fetched from
farther along the file. That was never an issue for this
example.
Post by s***@amu.edu.pl
"[...] each argument's preprocessing tokens are completely macro
replaced as if they formed the rest of the preprocessing file;"
"as if they formed the rest of the file" is only in sense of
available tokens; arguments may be replaced in expansion of
a macro during a rescan, which means that some macros may be
disabled and "painting" rules apply - which is not quite
the same "as if they formed ...".
You dropped the part right after the semicolon: "no other
preprocessing tokens are available.", which was provided
to reinforce what the meaning of the sentence was. No
single (portion of a) sentence in the standard can be
properly interpreted out of context.
s***@amu.edu.pl
2004-02-29 14:37:02 UTC
Post by Douglas A. Gwyn
Post by s***@amu.edu.pl
"Before being substituted, each argument's preprocessing
tokens are completely macro replaced [...]"
Where the [...] limits the pp token stream context to just
that bounded by the original argument.
ACK. This is where macro expansion is strictly recursive
even in "geographical" sense.
Post by Douglas A. Gwyn
Post by s***@amu.edu.pl
If, as you say, "nesting" is procedural (to which I agree), and
argument substitution is part of macro replacement (which IMO is
a sane assumption), then it would mean, that the arguments are
expanded in same context as the macro.
Not really; it is as if
I don't see the difference between me and you; by "context" I
understand set of disabled macro names ("being replaced").
Post by Douglas A. Gwyn
phase 5 input := result of phase 4(translation unit);
...
if identifier has macro def,
macro replace;
Here we "recursively" return to "macro replace", but since we
are in course of replacing current macro, we add it to the
context; I would note this like:

macro replace [current-context + current-macro-name]
Post by Douglas A. Gwyn
...
locate arguments;
for each argument,
argument := phase 4(argument);
Yes, but here `argument' is processed within the context
possibly passed by recursive call; see above.
The first (non-recursive) call gets empty context (file context),
ie. no macro names are disabled.
Post by Douglas A. Gwyn
... rest of current macro replacement
(with a few details omitted so as not to obscure the important
point)
Post by s***@amu.edu.pl
#define A() B
#define B(x) C(x)
A()(A())
//found A(), replace
B (A())
|A|
//rescan finds B(...), in context of A,
//does replacement; argument A() is expanded first;
//since we assume that this is part of B replacement,
//A is disabled, thus A in the arg gets painted: A'()
Yes, it is a nested replacement and the name A is seen
while the higher-level A is in the process of being
replaced, so that inner occurrence of A is painted blue.
Post by s***@amu.edu.pl
C(A()) //final result
Seems right.
Glad to agree. It was a suprise to me though. I aquired this
understanding when you mentioned "procedural nesting".

Originally I was under influence of Paul's interpretation
(I read the long discussion of 2003 between Paul Mensonides,
Jun Woong and Ivan Kosarev, google: "nested macro expansion"),
and my initial interpretation was in terms of "geographical"
context and nesting:
B (A())
|A|
//rescan finds "B(", "B" is in "context A" (this is where Paul
//would disagree), expand argument, "A" falls outside of
//"context A", thus expand, substitute and replace "B(..)".
C(B) //final result

This is actually the point where I saw value in "geographical"
interpretation of context, because I don't see any reason
why second "A" should be "forcibly" drawn into the same context
as "B" is in. For me it is not intuitive. Why would it be good
for me?

++++

To sum up: one text, three different interpretations.
The issue is for fn-like macros, whether macro-name pulls its
argument list into its context (your interpretation); or whether
aguments pull macro-name out of its context (Paul);
or whether its something in-between (initially me).

I have reviewed Dave Prosser's explanations of "the intent of
the specification", it'll yet take me a few days to understand
the algorithm he gives properly, but both the example and
the algorithm seem to strongly support Paul's interpretation.
The algorithm has a notion of sticky contexts (or "hide sets"),
which would translate directly to "geographical" context *and* nesting.
See the last but one `return' and (HS&HS')|{T} argument to subst() -
seems like the closing ")" determines the context of the macro.
I wish I knew exactly what subst() was doing.

But that obviously (at least for me) is not what the Standard says.

Can you (or Paul) give me some pointers to the *complete* algorithm
(at least what subst() does), as D.Prosser gave it, please.

++++
Post by Douglas A. Gwyn
Post by s***@amu.edu.pl
"[...] each argument's preprocessing tokens are completely macro
replaced as if they formed the rest of the preprocessing file;"
"as if they formed the rest of the file" is only in sense of
available tokens; arguments may be replaced in expansion of
a macro during a rescan, which means that some macros may be
disabled and "painting" rules apply - which is not quite
the same "as if they formed ...".
You dropped the part right after the semicolon: "no other
preprocessing tokens are available.", which was provided
to reinforce what the meaning of the sentence was. No
single (portion of a) sentence in the standard can be
properly interpreted out of context.
Yes, but what I meant is that "as if they formed [... until the end]"
suggests that this expansion (strictly "geographically" recursive here)
is done outside of any context (as would probably "the rest of the file"
be). The Standard might make a forward reference, that this expansion
might result from nested (recursive) call and be subject to additional
context rules (macro disabling and painting).
But this is minor and not worth continuing anymore.

++++

Yet another issue, this time less connected to things discussed before.

6.10.3.4#2 says at the end:
"These nonreplaced macro name preprocessing tokens are no
longer available for further replacement even if they are later
(re)examined in contexts in which that macro name preprocessing
token would otherwise have been replaced."

What the "(re)examined" remark means here?
My understanding was:

#define M exp
#define A(x) A
A(a)(M)
//expand A(a)
A (M)
|A| //rescan: found "A", have macro def A(..), *examine*
//token stream further, found "A(M)", check context,
//"A" found in "context A", painted blue, "A(M)" not
//expanded
A(M) //final result

But I feel I must be wrong, although that remark seems to
sanction this behaviour.
Prosser's algorithm is definitely against, as after first expansion
"A" would be pushed behind and a new expansion start from "(".
--
Stan Tobias
Neil Booth
2004-02-29 21:32:32 UTC
Post by Douglas A. Gwyn
I'm not sure what bug your preprocessor has. During the
expansion of B in B(Barg) there is an expansion of A(Barg)
that should not further recurse on B since B is still
"being replaced" (most readily implemented as a flag in
the symbol table). The nested B resulting from expanding
A(Barg) should be flagged ("painted blue", another flag in
the symbol table) and never replaced. The result of
expanding B(Barg) should thus be "expB+expA()+B(Barg)".
If you have the source code for the preprocessor you could
try "instrumenting" it (adding carefully placed printouts)
to watch how it is making its decisions while expanding
your test code.
You should ask Dave Prosser to send you the original pseudo-
algorithm that the C89 committee agreed too. That
algorithm does not correspond to your claim above, but to
what Paul explains in his reply. The basic idea was to
allow as much expansion as possible provided you were definitely
making progress (i.e. avoiding infinite recursion). In
particular, once you start reading extra tokens from the source
file you are no longer nested in what otherwise might be
considered a nested macro.

Dave's pseudo-code is what GCC >= 3.1 as I coded implements
(though the implementation does not correspond directly to
the description), and what most production compilers attempt
to implement with varying degrees of success.

Neil.
Paul Mensonides
2004-03-01 05:11:22 UTC
Post by Douglas A. Gwyn
Post by Paul Mensonides
FYI, here is a link which discusses the original algorithm and intent from
which
Post by Douglas A. Gwyn
Post by Paul Mensonides
http://gcc.gnu.org/ml/gcc-prs/2001-q1/msg00495.html
Note that the algorithm is not identical to the text of the standard.
Rather
Post by Douglas A. Gwyn
Post by Paul Mensonides
the text of the standard is simplified by breaking the procedural hierarchy
mold
Post by Douglas A. Gwyn
Post by Paul Mensonides
by removing the recursive model. Nevertheless, it is illustrative of the
original intent of the text of the standard.
Actually I was one of the people who worked on the wording
for the standard, and I remember Dave's proposed model.
What I don't remember is the committee agreeing to the
intersection-of-hidesets approach. (Note that despite
what you said in your other note, it is explicitly a
rescursive model.)
Maybe you should read what I wrote when I included the link. The standard does
not follow the pseudo-algorithm directly, but it closes replicates the results.
The
standard text removes the recursive model in favor of a simpler iterative model.

Regards,
Paul Mensonides
Paul Mensonides
2004-03-01 05:28:59 UTC
Post by s***@amu.edu.pl
Originally I was under influence of Paul's interpretation
(I read the long discussion of 2003 between Paul Mensonides,
Jun Woong and Ivan Kosarev, google: "nested macro expansion"),
and my initial interpretation was in terms of "geographical"
B (A())
|A|
//rescan finds "B(", "B" is in "context A" (this is where Paul
//would disagree), expand argument, "A" falls outside of
//"context A", thus expand, substitute and replace "B(..)".
C(B) //final result
This is actually the point where I saw value in "geographical"
interpretation of context, because I don't see any reason
why second "A" should be "forcibly" drawn into the same context
as "B" is in. For me it is not intuitive. Why would it be good
for me?
I'm not sure what you mean, but considering a partially nested invocation of a
function-like macro nested is definitely *not* a good thing. It
hyper-dramatically increases the number of macros required to implement
preprocessor metaprogramming constructs (i.e. constructs that are used to
generate (among many other things) repetitious program code). It creates a
"vertical" dependency that is much worse than the "vertical" dependency between
a macro and itself. Consider:

#define CAT(a, ...) PRIMITIVE_CAT(a, __VA_ARGS__)
#define PRIMITIVE_CAT(a, ...) a ## __VA_ARGS__

#define IIF(bit) PRIMITIVE_CAT(IIF_, bit)
#define IIF_0(t, ...) __VA_ARGS__
#define IIF_1(t, ...) t

#define EAT(...)

IIF(1)(CAT, EAT)(1, 2) // 12
IIF(0)(CAT, EAT)(1, 2) // [nothing]

This is a very small example that illustrates what I refer to when about the
increase in the number of macros. If partial nesting was considered nested, IIF
could not be implemented this way, it would have to have its own private version
of PRIMITIVE_CAT. That, in and of itself is not too bad, but it gets worse.
IIF itself becomes non-useful because anything that uses IIF would have to have
its own private version of IIF which in turn would need its own version of
PRIMITIVE_CAT. This gets worse and worse as things get more and more complex.
In effect, it makes it virtually impossible to make reusable components.
Consider a simplistic implementation of a WHILE loop:

#define WHILE(pred, op, ...) \
WHILE_1(pred, op, __VA_ARGS__) \
/**/
#define WHILE_S(pred, op, ...) __VA_ARGS__
#define WHILE_C(n, bit, ...) \
IIF(bit)( \
WHILE_ ## n, __VA_ARGS__ EAT \
) \
/**/
#define WHILE_1(pred, op, ...) \
IIF(bit)(pred(2, __VA_ARGS__))( \
WHILE_2, __VA_ARGS__ EAT \
)(pred, op, op(2, __VA_ARGS__)) \
/**/
#define WHILE_2(pred, op, ...) \
IIF(bit)(pred(3, __VA_ARGS__))( \
WHILE_3, __VA_ARGS__ EAT \
)(pred, op, op(3, __VA_ARGS__)) \
/**/
// etc.

The number of macro's required to allow for N iterations of the WHILE loop is N
plus some constant. But consider what you get if each iteration requires its
own IIF which, in turn, requires its own PRIMITIVE_CAT. That is already 3*N
macros for the WHILE loop steps plus 3*N PRIMITIVE_CAT macros plus 3*3*N macros
used to implement N IIF macros plus some constant. Now consider what happens if
you allow the result of the predicate to be any number instead of just 0 or 1:

#define SPLIT(i, ...) PRIMITIVE_CAT(SPLIT_, i)(__VA_ARGS__)
#define SPLIT_0(a, ...) a
#define SPLIT_1(a, ...) __VA_ARGS__

#define IS_NULLARY(...) \
SPLIT(0, CAT(IS_NULLARY_R_, IS_NULLARY_C __VA_ARGS__)) \
/**/
#define IS_NULLARY_C() 1
#define IS_NULLARY_R_1 1,
#define IS_NULLARY_R_IS_NULLARY_C 0,

#define COMPL(bit) PRIMITIVE_CAT(COMPL_, bit)
#define COMPL_0 1
#define COMPL_1 0

#define BOOL(x) \
COMPL(IS_NULLARY( \
PRIMITIVE_CAT(BOOL_, x) \
)) \
/**/
#define BOOL_0 ()

#define WHILE_1(pred, op, ...) \
IIF(bit)(BOOL(pred(2, __VA_ARGS__)))( \
WHILE_2, __VA_ARGS__ EAT \
)(pred, op, op(2, __VA_ARGS__)) \
/**/
// etc.

Imagine what kind of macro bloat occurs when this situation occurs, and so on
and so forth. Things get worse and worse very quickly.
Post by s***@amu.edu.pl
++++
To sum up: one text, three different interpretations.
The issue is for fn-like macros, whether macro-name pulls its
argument list into its context (your interpretation); or whether
aguments pull macro-name out of its context (Paul);
or whether its something in-between (initially me).
I have reviewed Dave Prosser's explanations of "the intent of
the specification", it'll yet take me a few days to understand
the algorithm he gives properly, but both the example and
the algorithm seem to strongly support Paul's interpretation.
The algorithm has a notion of sticky contexts (or "hide sets"),
which would translate directly to "geographical" context *and* nesting.
This is the one area where the standard specification deviates from the
algorithm. (I'll explain below.)
Post by s***@amu.edu.pl
Can you (or Paul) give me some pointers to the *complete* algorithm
(at least what subst() does), as D.Prosser gave it, please.
I'm not sure where you can get the rest of it. Doug?
Post by s***@amu.edu.pl
Yet another issue, this time less connected to things discussed before.
"These nonreplaced macro name preprocessing tokens are no
longer available for further replacement even if they are later
(re)examined in contexts in which that macro name preprocessing
token would otherwise have been replaced."
What the "(re)examined" remark means here?
Okay. :) The model described in the algorithm was the intent of the committee
at the time. However, the model of intent was more complex than it needed to
be, but it was exacting. Therefore, the text of the standard does not follow
the algorithm in favor of a simpler method that produces the same results as the
algorithm (but is described differently). The one except to that is this case.
Several things happened in the translation. First, the algorithm was flattened
into an iterative model (except for recursive argument subprocessing) which
simplifies the algorithm considerably. Second, the notion of "blue paint" was
introduced in order to prevent the necessity of having every identifier token
carry around a conceivable expensive hideset. In the algorithm, each identifier
preprocessing token (actually all tokens, but identifiers are the only ones that
matter) carries with it a hideset. The specification in the standard removes
that hideset from each identifier and instead specifies such context as a range
that extends from the first to the last preprocessing token in a replacement
list. Then, if that context referring to a specific macro name exists when the
corresponding macro name is scanned, it simply wouldn't be replaced. However,
in order to make this model approximate the algorithm, each of those
non-replaced macro names has to be permanently flagged (i.e. painted blue)--just
as they would have been implicitly if they carried a hideset. Now comes the
subtle difference between what the standard specifies and the algorithm. When a
macro has been replaced and scanning resumes at the beginning of the replacement
list, a context exists that disables the macro name. However, this context only
exists during the first (re)scan of the tokens that made up the replacement
list. So, after it has been rescanned (proceeding directly into the following
tokens), the context no longer exists, but any tokens that were painted are
still painted. The (re)examination that is referred to above is the result of
situation when something is used as a parameter:

#define A(x) x

A(pp-tokens)

Here, pp-tokens gets scanned once as the macro is invoked and once again when A
expands to x and gets rescanned. That rescanning is reexamination of whatever
pp-tokens became after the scan as an argument, which can cause more replacement
(but this time with a different set of disabling-contexts). This is where the
blue paint comes in that prevents an identifier from expanding in this new
context:

#define A(x) x

#define B() B

A(B()) // B' not B

In this example, B() is completely replaced and rescanned before it is
substituted for x in A's replacement list. However, it expands directly to
itself which causes the B identifier to be painted. At this point, rescanning
of B's replacement list is complete (because "no other tokens are involved")
yielding only B' (but not the B-disabling context which originally caused the
blue paint). I.e. it does not carry the context (i.e. hideset); it instead
carries the blue paint. Now A's replacement list gets rescanned, but the only
active context is the A-disabling context--B would be available here, but not
the specific B token that was previously painted. So far, the results in the
example are the same as in the algorithm, but the difference can be exploited by
something like this (and this is one of the areas where preprocessor conformance
starts to heavily decline--in the manipulation of details like this):

#define EMPTY()
#define DEFER(id) id EMPTY()

#define A() DEFER(B)()
#define B() DEFER(A)()

#define SCAN(x) x

A() // B()
SCAN(A()) // A()
SCAN(SCAN(A())) // B()
SCAN(SCAN(SCAN(A()))) // A()

This result is what it should be by the specification, but is different than the
results yielded by the algorithm. As long as more scans are applied
(parametrically), the tokens will continue to expand back and forth between the
two results because an A token never appears during the rescan of A's
replacement list and similarly with B. Note that this is not infinite
recursion/looping/whatever, more scans must be manually applied for this to take
place. The lack of context retention (unlike the algorithm), illustrated here
by manipulation, is why permanent blue paint exists--to closely but not entirely
approximate hidesets. Make sense?
Post by s***@amu.edu.pl
#define M exp
#define A(x) A
A(a)(M)
//expand A(a)
A (M)
|A| //rescan: found "A", have macro def A(..), *examine*
//token stream further, found "A(M)", check context,
//"A" found in "context A", painted blue, "A(M)" not
//expanded
A(M) //final result
Yes, the result should be A(exp) because the (M) is not considered an argument
list because the secondary A is not considered to be a macro. No attempted
invocation is required to cause blue paint on an identifier. The identifier
must merely exist inside the context. Note that this is the simple case that we
all should agree on. The A token is seen within the disabling context of A, and
that is all that is required to paint it.

Regards,
Paul Mensonides
Paul Mensonides
2004-03-01 05:29:00 UTC
Post by s***@amu.edu.pl
Post by Paul Mensonides
The wording of the C standard makes it clear that the rest of the preprocessing
tokens in the file are *not* included. "If the name of the macro being replaced
is found during this scan of the replacement list (not including the rest of the
file's preprocessing tokens), it is not replaced. Furthermore, if any nested
replacements encounter the name of the macro being replaced it is not replaced."
This is a little OT here, because the discussion is about what
"nested" means here. But this excerpt is vital here, so I'll rewrite it
6.10.3.4
#1 [...]
Then, the resulting preprocessing token sequence is rescanned,
along with all subsequent preprocessing tokens of the source file
//this inclusion of the rest of the file makes sense only
//for fn-like macros, either to determine whether fn-like
//"macro-name" forms a macro (with subsequent lparen), or to complete
//fn-like macro argument list.
More specifically, the inclusion of the rest of the file makes sense only for
fn-like macro invocations that span the end of the replacement list of some
other macro (function-like or object-like).

Note also that this text is the literal definition of non-recursion. It is
resumption of scanning, not a recursive scan that continues until the end of the
file.
Post by s***@amu.edu.pl
, for more macro names to replace.
//actually "macros" are replaced, not "macro names", this is
//probably a human language deficiency, but also maybe direct
//reference to next point, see futher
#2
If the name of the macro being replaced
//note: "name of the macro", not "macro" or "macro invocation";
//the "macro" is replaced, not "macro name"
Actually, "macro invocations" are replaced. A "macro" is a reference to the
entity in a symbol table that is produced by a macro definition, e.g.

#define MACRO() // ...

A "macro name" is just the name of the macro, e.g.. MACRO here. The standard
does not use these terms consistantly, but it is nevertheless clear where it
counts.
Post by s***@amu.edu.pl
is found during this scan
//ie. "rescan"
Yes.
Post by s***@amu.edu.pl
of the replacement list
//which resulted in previous replacement before "rescan"
(not including the rest of the source file's preprocessing tokens),
//this is redundant, it only adds emphasis to "the replacement list",
//ie. this and no further
It is explanatory. At this point, the preprocessing tokens that made up the
macro invocation have already been replaced by the replacement list of the
macro. Scanning (a.k.a. rescanning) resumes at the beginning of the
preprocessing tokens that made up the replacement list from which they came.
The above text is a clarification that the macro name (if found) will no longer
be disabled (a.k.a. painted blue) if found after that point.
Post by s***@amu.edu.pl
it is not replaced.
Further, if any nested
//"nested" means replacements that occur during this rescan,
//cf. #1
replacements encounter the name of the macro
//again: "name of the macro", not "macro invocation"
being replaced, it
//"it" surely refers to "macro", not "name of the macro"
is not replaced. These nonreplaced [...]
Nowhere in this paragraph "Rescanning..." is the word "invocation"
even mentioned. Rescanning and invocation (verb) are two separate
procedures.
Yes, but scanning causes replacement (if macro invocations are found) which in
turn causes (re)scanning which causes more replacements and so on for the rest
of the file. What this means is that rescanning is just resumption of scanning.
The only difference is that another disabling context exists that stretches from
the point where scanning resumes until the last preprocessing token that came
from the replacement list. This is iterative, not recursive.
Post by s***@amu.edu.pl
Post by Paul Mensonides
Post by Douglas A. Gwyn
Post by Paul Mensonides
The viewpoint that macro expansion forms an invocation hierarchy is faulty
which
Post by Douglas A. Gwyn
Post by Paul Mensonides
leads to interpretations like this one.
The specification is explicitly recursive.
No, it is explicitly non-recursive (see below).
The "macro expansion" algorithm may be not necessarily recursive,
but literaly the Standard _is_ recursive: rescanning and parameter
replacements make direct references to expantion (earlier in the
Standard), which possibly contains nested param expansions and is
followed by rescan, which ... etc.
No, a macro expansion algorithm may be recursive, but the standard is literally
iterative. It would be recursive (i.e. form a functional hierarchy) if and only
if the replacement list of a macro was fully macro-replaced before it replaced
the invocation of the macro--but that is not what happens. The standard
*literally* says that replacement happens *first*, followed by argument
substitution, followed by stringizing and token-pasting, and ultimately followed
by the resumption of scanning.
Post by s***@amu.edu.pl
Sometimes macro expansions are independent (parameter) and sometimes
they are performed in context of subsequent tokens (in file, or
nested parameter expantions within parameter expansions).
They are always performed in the context of subsequent tokens. With arguments,
there just aren't any more after the end of the argument. I.e.

#define A() B
#define B() 1

#define SCAN(x) x

SCAN( A()() ) // 1
Post by s***@amu.edu.pl
Post by Paul Mensonides
Post by Douglas A. Gwyn
Post by Paul Mensonides
expandB+ A (Barg)
| |^| |
| |_|______|
|__________| |
| A invocation (#1)
B context
expandB+ expandA()+B(Barg)
|^ |
|_________________|
|
A context
No, you dropped a context there. The original expansion
of B has not yet concluded, because part of the process
requires expansion of a nested macro A, which has not
quite concluded...
No, I didn't. The context no longer exists because the invocation is not nested
This is the crux of the problem: you say "invocation is not nested",
which obviously isn't, but the Standard doesn't ever mention "invocation"
during "rescan", but merely "nested replacement".
Yes, but replacement is the result of invocation (and only invocation). The
standard makes this part (at least) clear. As you say, this part is only
relevant to function-like macro invocations, because only they can span the end
of the replacement list. If A in the above was an object-like macro instead of
a function-like macro, the token A by itself would constitute the entire
invocation which would indeed be nested.
Post by s***@amu.edu.pl
Here "macro name" A falls within the replacement procedure of B,
next we determine that A forms an invocation, thus we expand it
(ie. macro, but maybe this is why "macro name" is used in #1 - see above),
this replacement takes place within the context of B expansion,
because token A originated from there.
No. The token A does not constitute a macro invocation. Only A immediately
followed by an ( constitutes the beginning of a macro invocation (as defined
previously). That is the minimum that is required to say that it is an
invocation.
Post by s***@amu.edu.pl
Post by Paul Mensonides
tok tok 1 tok tok
||| ^ |||
|||___|||
|| | ||
|| C context
||_____||
| | |
| B context
|_______|
|
A context
tok tok 1 tok tok
^
Fully agreed.
If so, how can you say it is recursive? That behavior is iterative.
Post by s***@amu.edu.pl
Post by Paul Mensonides
Post by Douglas A. Gwyn
Unfortunately the term is used
colloquially with no further elucidation; however, there
is no clue that it should be thought of as geographic
nesting, and in its immediate context the only already
defined nesting process it could be referring to is the
logical recursive expansion.
I think what Douglas meant by "geographic nesting" is "syntactical
nesting" (as in C syntax), which is not taking place in macro expansion;
and by "procedurally nested" (earlier) he means 'replacement procedure'.
I think we all agree here.
No, we definitely do not. I agree with the terminology, and I know what the
difference is between geographic/syntactic nesting and procedural nesting. The
standard model is *absolutely* not procedural nesting. It is an iterative loop
that begins with the replacement of a macro invocation with the preprocessing
tokens of the macro's replacement list, followed by argument substitution,
followed by stringizing/token-pasting, and finally followed by the resumption of
scanning. In the following example,

#define A() B()
#define B() 1

A()

The invocation of B() does not "return" to A() which in turn returns to the
top-level. That is procedural nesting.

Instead, A() immediately returns the tokens "B()" to the top-level and scanning
resumes, which causes the expansion of B() which in turn immediately returns "1"
to the top-level and scanning resumes (which is a no-op on "1"). That is
iterative, not procedural, and is *by definition of the standard* regardless of
the familiar notion of procedural nesting and how much one wishes for that to be
the case.
Post by s***@amu.edu.pl
When I started the editor, my intention was to agree with you, and to back
up your argument I wanted to bring that excerpt where it says "nested
invocation", but I din't find it. If the Standard said in 6.10.3.4#2
"if any nested replacements encounter the invocation of the macro
being replaced", I would definitely say you were right, because
in the above example "A invocation" is not nested; but it doesn't.
I never said or implied that at all. All that is required is for a single
identifier preprocessing token that names a macro to exist in a context that
causes painting. If that happens, the single identifier preprocessing token
gets permanently painted. No "attempted" invocation is necessary to cause this
painting. Nevertheless, a disabling context *is* required. Hence, in the
following example:

#define A() A

A()() // A'()

The token A does get painted, even though no nested invocation of A is
attempted. At the same time,

#define A() B
#define B() A

A()() // A

The token A does not get painted, because it is the result of an invocation that
is not nested within the context that disables A.
Post by s***@amu.edu.pl
Please, don't use the word "invocation" in context of "rescan",
because the Standard doesn't do so, and that has already confused
me a lot.
The act of a macro replacement is the consequence of a macro invocation
(especially with a function-like macro which is the only applicable kind of
macro in the context that we've been discussing). The standard doesn't use the
terminology consistently, but it is still clear.
Post by s***@amu.edu.pl
IMO the use of "invocation" is only valid in terms of
"macro replacement", ie. to determine whether function-like macro-name
(token) forms a "macro" ("macro invocation").
I have taken notice of the link you gave in the followup to your
article (I'll yet have to read it more closely later). The expansion
of G_0(42) definitely disagrees with my reasoning and, what I think,
what the Standard says. Okay, maybe the intention was different,
by Dave Prosser's explanations; then maybe the Standard has a flaw
there and needs correcting. Only what do we do until then?
I personally think that the standard is clear (or, at least, clear enough). It
could certainly be spelled out better. Note that in the Annex J of C99 (which
is non-normative) there is a reference to this issue that says that it is
unspecified. This was a result of an old DR. The resulting text is either a
failure to adhere (or remember) the original intent of the standard text which
was derived from that pseudo-algorithm or a disregard of it for whatever
reasons. There is a similar C++ DR (DR #268) that is the result of the
discussion with Dave Prosser. It has a conclusion that agrees with me.

Regards,
Paul Mensonides
Paul Mensonides
2004-03-01 05:29:01 UTC
Post by Douglas A. Gwyn
Post by Paul Mensonides
The wording of the C standard makes it clear that the rest of the preprocessing
tokens in the file are *not* included. "If the name of the macro being replaced
is found during this scan of the replacement list (not including the rest of the
file's preprocessing tokens), it is not replaced. Furthermore, if any nested
replacements encounter the name of the macro being replaced it is not replaced."
You left out the preceding paragraph, which sets the
"... Then, the resulting preprocessing token
sequence is rescanned, along with all subsequent
preprocessing tokens of the source file, for
more macro names to replace."
So, by your logic, ever macro expansion is a contained process that recurses
down the entire file. That is completely flawed reasoning. The above paragraph
shows how and where the *resumption* of scanning occurs. The following
paragraph defines the concept of identifier painting with the explicit caveat
that the rest of the preprocessing tokens are not part of that painting process.
Post by Douglas A. Gwyn
Then the first sentence of the part you quoted is not
applicable to the inner "B" in our example, because it
does not exist yet during "this" scan (the scan of the
immediate consequence of tne parameter substitution
and #,## processing, looking for macro names).
However, the second sentence ("Furthermore, ..." does
apply to the name "B" that results from the nested "A"
replacement (which recursively involves its own scan
for further macro names).
The scan of the "immediate consequence" is normal scanning that proceeds
directly into the tokens that follow the outer invocation. The tokens that came
from the replacement list of that invocation constitute the explicit boundaries
for identifier painting. The second sentence is still under the context
established by the first sentence. The only question is whether a partial
invocation constitutes a nested invocation. Further, as I say again, your model
is flawed. The macro expansion process is not, by definition, like a function
call that ultimately returns the rescanned tokens of the replacement list. An
invocation is replaced by the replacement list before rescanning ever occurs
(which is explicit and unarguable). As such, there is no procedural nesting,
which ultimately mimics the intent of the recursive algorithm from which the
standard text was derived.
Post by Douglas A. Gwyn
Post by Paul Mensonides
No, I didn't. The context no longer exists because the invocation is not nested
(see below). You are blatantly ignoring what the standard dictates because of a
typical view that macro expansion is like a procedural hierarchy. Yes, scanning
is not finished with the preprocessing tokens of B's replacement list, but is
finished with the *replacement* of the invocation of B.
No, it is not. The subclauses of 6.10.3 all are part
of the process of macro replacement described by clause
6.10.3. That includes 6.10.3.4 (rescanning and further
replacement).
This is what the standard literally says. This is *unarguable* because there is
no other literal way to interpret it.

[quote]

A preprocessing directive of the form

# define identifier replacement-list new-line

defines an object-like macro that causes each subsequent instance of the macro
name to be replaced by the replacement list of preprocessing tokens that
constitute the remainder of the directive.

[end quote]

It then further goes on to say:

[quote]

Each subsequent instance of the function-like macro name followed by a ( as the
next preprocessing token introduces the sequence of preprocessing tokens that is
replaced by the replacement list in the definition (an invocation of the macro).

[end quote]

This is explicit. An invocation is replaced by the replacement list in the
definition of the macro, not by the rescanned replacement list (not even by the
argument-substituted, stringized, and token-pasted replacement list).

Further, you logic that 6.10.3 describes a single macro expansion from start to
finish is incorrect. 6.10.3 describes macros. That includes the definition of
macros (which has nothing to do with the macro expansion process), the scope of
macro definitions, the invocation of macros, the semantics of macros, and the
undefinition of macros (which also has nothing to do with the macro expansion
process).
Post by Douglas A. Gwyn
There are ways for implementations to take shortcuts,
as it it well known that tail recursion can be replaced
by iteration. But the specification does not refer to
any such shortcuts, and if an implementation loses a
context due to being in too big a hurry to move on, it
is simply a defective implementation.
This is based on an incorrect assumption that a procedural model exists--which
it doesn't. You have to proof that "replaced by the replacement list in the
definition" and similar does not mean what it literally says. Instead, you
assume that it is procedural model and try to bend the standard text to that
assumption. As such, an implementation can implement macro expansion
recursively, but then it has to mimic the behavior described in the standard.
If it does not, then it is a defective implementation.
Post by Douglas A. Gwyn
If we were to take you view to its logical extreme,
then there would be no possibility of an in-process
macro name appearing within a nested expansion (past
the first level). That is contradicted by the standard
specifying special consequences for cases where that
*does* occur.
No, it isn't. It doesn't appear procedurally nested, but it does appear
geographically or syntactically nested. If we were to take your view to its
logical extreme, then each macro expansion would denote a recursion into all the
rest of the preprocessing tokens of a file, each time going deeper and deeper
into a procedural hierarchy that ultimately returns only when the end of the
file is reached. That is ridiculous.
Post by Douglas A. Gwyn
However, the standard specifies when they are examined and/pr
replaced and when they are not examined and/or replaced.
When I say colloquially that tokens are "fetched", I mean
merely that they are available for examination and possible
replacement.
Okay. The only reason that I mention it is that you've made references to a
"replacement buffer" which (though is a possible implementation strategy) does
not exist in the conceptual model defined by the standard.
Post by Douglas A. Gwyn
Post by Paul Mensonides
.. That is not what the standard specifies. It specifies replacement first,
then rescanning, not rescanning followed by replacement.
I am not at all confused about the sequencing.
No, you're not confused. You're flat out wrong.
Post by Douglas A. Gwyn
The
process of macro replacement, starting just after
an identifier has been recognized as a defined macro
name during the appropriate phase of processing,
involves temporarily flagging that global identifer
as "in the process of being replaced", locating the
tokens for the macro arguments and fully macro-replacing
each of *them* [recursive subprocessing], substituting
^^^^^^^^^^^^^^^^^^^^^^^
Yes.
Post by Douglas A. Gwyn
each fully-expanded argument for the corresponding
parameter in the definition for the current macro,
concurrent with #,## processing, then rescanning the
result of the previous operation, looking for
identifiers corresponding to defined macros, and for
if (global identifier is "in the process of
being expanded") then apply permanent blue
paint to that identfier pp token (not to
the global identifier, as I might have
mistakenly indicated in a previous posting).
if (identifier token has ever had blue paint
applied) then leave it intact;
else (identifier is a defined macro name,
and has not been painted blue) so
begin a macro replacement process for
that identifier, and if it is a
function-like macro, it is allowed to
access remaining pp tokens (i.e. those
not involved in any nesting replacement
process from expansion of macros that
ay still be in process) when fetching
its arguments.
*After* each such name (we're now back to the original
replacement buffer) has been fully macro-replaced, the
current macro replacement process (6.10.3) is complete
and the "in the process of being replaced" flag is
removed from the global identifier. The context now
pops back to whatever scanning was being done (top
level or some nested macro replacement).
That is absolutely *not* what the standard specifies. That is what you're
flawed conceptual model specifies. This is what the standard actually
specifies:

The process of macro replacement starts just after an identifier has been
recognized as a defined *object-like* macro name or just after a ( following an
identifier that has been recognized as a defined *function-like* macro name. If
it is a function-like macro invocation, the arguments are identified and each of
them are recursively subprocessed *if* the corresponding parameter appears in
the replacement list of the macro without being an operand of the stringizing or
token-pasting operators. The replacement list of the macro replaces the
invocation of the macro. Actual parameters (or fully macro expanded actual
parameters depending on the # and ## operators) are substituted for formal
parameters in the sequence of preprocessing tokens that came from the
replacement list.. Stringizing and token-pasting occur. Scanning resumes at
the first preprocessing token that came from the replacement list, and a context
is established (that will paint the macro name preprocessing tokens if found)
that endures until just after the last preprocessing token that came from the
replacement list.

This single process proceeds from the beginning of the file to the end of the
file--with the exception of the interpretation of directives (or
"non-directives") which only has a defined meaning if no contexts currently
exist and when a recursive subprocess of an argument is currently active. Note
once again that this process is not a direct translation of the "intent
algorithm". If it was, it would "engender" implementation strategies that would
be horribly inefficient, though it would produce similar results. The point
that is particularly important about the model is that no procedural nesting
occurs. Therefore, "nested" as used in the standard text can only mean nesting
within a context. The only question is whether an invocation that is partially
nested is considered nested. Even though that is unclear in a literal sense,
examination of the original intent shows that partial nesting should not be
considered nested. Granted, "original intent" is non-normative :), but it is a
reasonable thing to mimic in an implementation when the standard contains a
non-normative note in Annex J that says it is unspecified and when a C++ DR says
that it should be that way as well. Further, examination of the underlying
purpose of painting (which is to prevent the preprocessor implementation from
having to deal with infinite macro expansion which is very difficult to detect
because of the iterative model) yields the logical conclusion that only an
invocation that is completely nested could cause infinite expansion. This,
coupled with a great deal of existing code and the semantics of virtually every
existing C and C++ preprocessor in common use, results in only a single viable
way of interpreting an invocation that is partially nested.
Post by Douglas A. Gwyn
Post by Paul Mensonides
... it as if every invocation is physically inlined into the stream
of preprocessing tokens where the invocation existed (which is either the
top-level stream or an argument being processed as an separate stream).
That is merely what "substitution" consists of, and
has no deeper meaning.
"Substitution" occurs with arguments, replacement occurs for a macro invocation.
That replacement consists of the exact preprocessing tokens of the replacement
list as explicitly specified by the standard. There is no deeper meaning
involved, that is simply what the standard specifies.
Post by Douglas A. Gwyn
Post by Paul Mensonides
No it doesn't. The invocation of B at this point is nested within the
rescanning of A's replacement list, but not B's.
That makes no sense if "nesting" has its logical
(procedural) meaning rather than a geographical
interpretation.
Yes, it makes no sense if "nesting" has a procedural meaning, but it doesn't. I
disagree that "logical" meaning implies procedural. Both procedural nesting and
syntactic nesting are perfectly logical forms of nesting.
Post by Douglas A. Gwyn
Repeating what I said above, that
would mean that there would be no possibility of an
in-process macro name appearing within a nested expansion
past the first level, which is contradicted by the
standard discussing exactly that case.
The standard never refers to it those terms at all. The standard discusses only
the nesting within a context.
Post by Douglas A. Gwyn
Post by Paul Mensonides
Post by Douglas A. Gwyn
Note that the process described in 6.10.3.4 is one
*component* of the process 6.10.3, not something that
happens after 6.10.3 is complete.
(re)scanning -> replacement -> substitution -> #/## ->
^ |
|____________________________________________________|
That is contrary to all similar constructions in the
standard, such as what constitutes the referred-to
operand in a subexpression,
Yes, macro expansion is quite different than a procedural hierarchy. The
assumption that it is because it seems "normal" is flawed. Just the fact that
partial nesting can occur shows that the model is very foreign to the normal
semantics of a procedural hierarchy.
Post by Douglas A. Gwyn
and is clearly contradicted
by the explicitly recursive wording of the specification,
The standard does not have any explicitly recursive wording regarding macro
expansion whatsoever. Instead, it literally and explicitly dictates an
iterative model (except for argument processing which is correctly recursively
subprocessed as you say). In fact, it is incredibly clear in this regard. The
*only* thing that isn't clear is what happens when an invocation is partially
nested. If procedural nesting occurred, "partial nesting" is impossible.
(also, if procedural nesting occurred there would probably be an
implementation-defined nesting limit (which there isn't)) What you keep
referring to are general notions "recursive wording" and that 6.10.3 defines a
single complete process for any give macro replacement. Neither of those
notions are present in the text of the standard. 6.10.3 refers to many things
besides the replacement process, the fact that 6.10.3.1 through 6.10.3.4 are
nested within 6.10.3 does not imply that they are subprocesses of some macro
expansion that began at 6.10.3 (and it especially doesn't imply that they are
subprocesses of macro definition or sibling processes of macro undefinition).
Post by Douglas A. Gwyn
and the explicit reference to more than one level of
nesting. (Your model involves only one level.)
The model involves only one level of *procedural* nesting (except for arguments
which can cause an number of levels), but any number of levels of *contextual*
nesting.
Post by Douglas A. Gwyn
Note
also that macro replacement within each argument to a
macro is necessarily performed as if 6.10.3 is invoked
as a subroutine.
Not 6.10.3 as a whole (which makes no sense). Rather, only the parts dealing
with the macro replacement process. It is as if it was a separate file (i.e.
stream of preprocessing tokens), but possibly with some contexts already
established.
Post by Douglas A. Gwyn
Post by Paul Mensonides
6.10.3/9 - "...that causes each subsequent instance of the macro name to be
***replaced by the replacement list of preprocessing tokens that constitute the
remainder of the directive***."
6.10.3/10 - "...that is ***replaced by the replacement list*** in the
definition..."
Yes, that explains the eventual purpose of the "body"
of a macro *definition*. The actual replacement
process does not begin at that point, but upon
subsequent recognition of a defined identifier during
the top-level pass across the input to translation
phase 4, and also upon recognition of defined
identifiers during collection of macro arguments, and
also during recognition of defined identifiers during
the rescan phase of *each* macro expansion process,
except when the identifier pp token has had blue paint
applied.
No, those two locations are the only locations that describe the semantics of
replacing an instance of a macro invocation. It does not say that it does that
*anywhere* else. I agree that this is not a good way of describing the process,
but nevertheless, that is the way it is.
Post by Douglas A. Gwyn
Post by Paul Mensonides
A macro invocation is replaced by the preprocessing tokens of the replacement
list, *not* by the rescanned (i.e. macro-expanded) preprocessing tokens of the
replacement list. Conceptually, this step happens even before argument
substitution.
No, not at all. The complete replacement can involve
much more than what was contained explicitly within
the body of the definition. That is why we spell out
the process in detail.
"Completely replaced" is a term used only when referring to argument
subprocessing, not a single macro expansion. The standard uses "completely
replaced" referring to all the preprocessing tokens that make up an argument.
It does not say that all of the macro invocations within that sequence of
preprocessing tokens are completely replaced--only that the entire sequence is
completely replaced. (It does say that all the macro invocations are
"expanded", however.)

The replacement itself is specified in only two places (those that I quoted).
Those are not forward references to macro expansion, they define the initial
step of macro expansion as scanning procedes.
Post by Douglas A. Gwyn
Post by Paul Mensonides
... There is no procedural nesting, there is only
physical/geographical nesting designed to prevent macro expansion from looping
forever.
Actually, "looping forever" is a logical/procedural
notion, not a geographic one.
No, recursing forever is a procedural notion. Looping forever is an iterative
notion. And, once again, "logical" is not a synonym of "procedural" even when
computer science context is applied.
Post by Douglas A. Gwyn
Post by Paul Mensonides
Post by Douglas A. Gwyn
... Further, the specification makes it very clear
that blue paint is permanent, so there is no way for the
newly created "B" to ever trigger macro replacement.
Absolutely, blue paint is permanent--
But since the inner "B" was not *geographically* nested,
by your interpretation you'd have to say that it wasn't
painted blue.
The second (i.e. "inner") B invocation (at point #2) is completely nested within
the A-disabling context. It therefore expands within that context which
subsequently causes the A identifier preprocessing token to be permanently
painted.
Post by Douglas A. Gwyn
I still think this whole matter boils down to whether
"nested" means procedurally or geographically.
I think that it boils down to whether partially nested is considered nested in
this context. Either view amounts to the same thing though.
Post by Douglas A. Gwyn
I know
that we recently discussed this in a WG14 meeting, but
I seem to recall that it was during an informal session
and that hope was expressed that nobody would file a DR
asking for a clarification. If as you have said there
is a "ton" of existing software that *relies* on that
detail, then it might even be unwise to ask for an
official ruling, which is likely to be as I have
explained, forcing implementations that have done it
wrong to make a choice between conformance or making
existing bogus code continue to do what the programmer
expected.
What you say is "wrong" is the original intent as specified by the (now
infamous) "intent algorithm". I realize that you probably don't think that C++
should matter to C in this area, but C++ already has a DR for this exact thing
with a probable resolution that it should adhere to the original intent as
defined by the algorithm in this area though it needs to be agreed on by both
committees. Some examples of the existing code to which I refer are C++
libraries in Boost whose use is widespread, the Boost preprocessor library
(which is both a C and C++ library) whose use outside of Boost exists in many
places (I even heard that someone was porting Loki to C with it.), Andrei
Alexandrescu's and John Torjo's SMART_ASSERT article/implementation (available
at CUJ), and Chaos (which is a significantly more advanced version of Boost
preprocessor that supports C99 constructs fully (such as variadic macros and
placemarkers)). The moral of the story is that the preprocessor is used for
code generation in many places, and its use for that purpose is growing. An
interpretation other than the original intent regarding partial invocation
causes a horribly exponential increase in the number of macros required to
implement a general purpose component in such a framework. Such a specification
would be pointless, and likely cause a divergence between the C and C++
preprocessors which would be unfortunate--especially considering that the C++
EWG has actively agreed that compatibility with the C preprocessor wherever
possible should happen.
Post by Douglas A. Gwyn
(More likely, another option would be added
to the compiler, and mentioned in the conformance
section of the documentation.)
It already is in Annex J of C99.
Post by Douglas A. Gwyn
It would be good to advise programmers not to rely on
your model (the geographic nesting one) nor on my model
(the procedural nesting one) since implementations may
differ on this score.
And according to the (non-normative) note in Annex J, the partial invocation
case can go either way. However, the behavior that I support is by far the most
prevalent in existing preprocessors and matches the result of the original
intent and does not defy the underlying point of painting in the first place.

Regards,
Paul Mensonides

Next Page >
Page 1 of 2