Discussion:
double cpp expansion?
(too old to reply)
s***@amu.edu.pl
2004-02-27 15:31:55 UTC
#define A(x) expandA()+B(x)
#define B expandB+A
B(Barg)
A(Aarg)

my preprocessors give:

expandB+ expandA()+ expandB+A ( Barg )
expandA()+ expandB+A ( Aarg )

In the first case `B' macro is expanded twice, unlike
the second case, where each macro is expanded once.

With a little simplified definitions:
#define A() B()
#define B A
my solution is:

B ( ) //replace "B"

A ( ) //initial rescan leaves "A" as is
|B|
// "(" found after "A" - replace "A" in
// context of "B" - nested expansion

B ( ) // start nested rescan
| A |
| B |

B'( ) // rescan finds B in "B" context - paints token,
| A | // nothing else changes
| B |

B() // result

Could you please check my expansion and tell me
what error I do, if any? Thank you.
--
S.Tobias
Douglas A. Gwyn
2004-02-27 21:28:16 UTC
I'm not sure what bug your preprocessor has. During the
expansion of B in B(Barg) there is an expansion of A(Barg)
that should not further recurse on B since B is still
"being replaced" (most readily implemented as a flag in
the symbol table). The nested B resulting from expanding
A(Barg) should be flagged ("painted blue", another flag in
the symbol table) and never replaced. The result of
expanding B(Barg) should thus be "expB+expA()+B(Barg)".
If you have the source code for the preprocessor you could
try "instrumenting" it (adding carefully placed printouts)
to watch how it is making its decisions while expanding
your test code.
s***@amu.edu.pl
2004-02-28 01:05:46 UTC
Post by Douglas A. Gwyn
the symbol table) and never replaced. The result of
expanding B(Barg) should thus be "expB+expA()+B(Barg)".
Thank you a lot! You have reassured me.

Compilers/preprocessors I have tested were:
latest gcc, latest como, msvc7.1, Digital unix cc (probably old)
and latest mcpp - all of them expand to expB+expA()+expB+A(Barg)

Did the standard change, ambiguities, or nobody (few?) did it
the right way? Is there at least one exemplary implementation?
Post by Douglas A. Gwyn
If you have the source code for the preprocessor you could
try "instrumenting" it (adding carefully placed printouts)
I've looked into gcc's cpp code, sounds like a little too much
for me just now. But mcpp (announced these days) has an interesting
feature #pragma __debug_cpp __expand.

I think I'll start sending bug reports.
Thanks again!
--
S.Tobias
Paul Mensonides
2004-02-28 03:21:20 UTC
Post by Douglas A. Gwyn
I'm not sure what bug your preprocessor has. During the
expansion of B in B(Barg) there is an expansion of A(Barg)
that should not further recurse on B since B is still
"being replaced" (most readily implemented as a flag in
the symbol table).
This is incorrect. When a macro invocation spans the end of a replacement list
it is not necessarily considered nested. The traditional approach is to
consider it *not* nested, and a ton of existing code requires that behavior.
For this reason and the wording of the standard, virtually every major
preprocessor implementation has this behavior--despite the non-normative note in
the appendix that says it is unspecified whether it is nested or not.

The viewpoint that macro expansion forms an invocation hierarchy is faulty which
leads to interpretations like this one. Macro expansion is conceptually
(regardless of a particular implementation strategy) an "in place", iterative
operation. As such, the expansion procedes as follows (ignoring argument
expansion because there is none involved):

#define A(x) expandA()+B(x)
#define B expandB+A

B(Barg)

B (Barg)
| ^ |
|___|
|
B invocation

expandB+A (Barg)
|^ |
|_________|
|
B context

expandB+ A (Barg)
| |^| |
| |_|______|
|__________| |
| A invocation (#1)
B context

expandB+ expandA()+B(Barg)
|^ |
|_________________|
|
A context

expandB+ expandA()+ B (Barg)
| | ^ | |
| |___| |
| | |
| B invocation (#2)
|_____________________|
|
A context

expandB+ expandA()+ expandB+A (Barg)
| |^ | |
| |_________| |
| | |
| B context |
|___________________________|
|
A context

expandB+ expandA()+ expandB+A' (Barg)
| | ^ | | (#3)
| |__________| |
| | |
| B context |
|____________________________|
|
A context

Note in particular the points #1, #2, and #3. The invocation of A at point #1
is not considered nested within the disabling context established by the
replacement list of B. At point #2, B's invocation *is* nested within the
context established by the replacement list of A. Hence, when the A
preprocessing token is found at point #3, the A-disabling context is still
active, and the token is painted (i.e. the apostrophe). The second example
procedes as follows:

// #define A(x) expandA()+B(x)
// #define B expandB+A

A(Aarg)

A(Aarg)
|^ |
|_______|
|
A expansion

expandA()+B(Aarg)
|^ |
|_________________|
|
A context

expandA()+ B (Aarg)
| | ^ | |
| |___| |
| | |
| B invocation
|_____________________|
|
A context

expandA()+ expandB+A (Aarg)
| |^ | |
| |_________| |
| | |
| B context |
|___________________________|
|
A context

expandA()+ expandB+A' (Aarg)
| | ^ | |
| |__________| |
| | |
| B context |
|____________________________|
|
A context

Regards,
Paul Mensonides
Douglas A. Gwyn
2004-02-28 07:25:02 UTC
Post by Paul Mensonides
This is incorrect. When a macro invocation spans the end of a replacement list
it is not necessarily considered nested.
The wording in the C standard makes it pretty clear that
the inner replacement occurs *during* the process of the
original replacement (thus setting the condition for blue
paint), and that it is only whether a macro *name* is
seen within a replacement buffer that determines the
onset of a nested replacement (which can involve argument
tokens beyond the span of the higher-level macro-plus-
arguments).
Post by Paul Mensonides
a ton of existing code requires that behavior.
Really? I find that surprising.
Post by Paul Mensonides
The viewpoint that macro expansion forms an invocation hierarchy is faulty which
leads to interpretations like this one.
The specification is explicitly recursive.
Post by Paul Mensonides
expandB+ A (Barg)
| |^| |
| |_|______|
|__________| |
| A invocation (#1)
B context
expandB+ expandA()+B(Barg)
|^ |
|_________________|
|
A context
No, you dropped a context there. The original expansion
of B has not yet concluded, because part of the process
requires expansion of a nested macro A, which has not
quite concluded...
Post by Paul Mensonides
expandB+ expandA()+ B (Barg)
| | ^ | |
| |___| |
| | |
| B invocation (#2)
|_____________________|
|
A context
...until the A-replacement (sub)buffer is examined for
further nested macros to replace, *during* which
examination an occurrence of B (or of A) shall not be
replaced. The specification practically tracks this
very example step by step, so it is easy to apply.

Note that the process described in 6.10.3.4 is one
*component* of the process 6.10.3, not something that
happens after 6.10.3 is complete. In fact the reference
to nested macros encountering the name being replaced
makes sense only with that understanding. I think what
may be throwing you is the use of the term "nested",
which you might think means geographically nested instead
of procedurally nested. Unfortunately the term is used
colloquially with no further elucidation; however, there
is no clue that it should be thought of as geographic
nesting, and in its immediate context the only already
defined nesting process it could be referring to is the
logical recursive expansion.

The only intentional ambiguity in this area is the one
referred to in the response to DR #017 (Question 9),
which refers to the situation only after *all* expansion
is complete, and a different situation than in this
example. Further, the specification makes it very clear
that blue paint is permanent, so there is no way for the
newly created "B" to ever trigger macro replacement.
Paul Mensonides
2004-02-28 09:05:36 UTC
Post by Douglas A. Gwyn
Post by Paul Mensonides
This is incorrect. When a macro invocation spans the end of a replacement list
it is not necessarily considered nested.
The wording in the C standard makes it pretty clear that
the inner replacement occurs *during* the process of the
original replacement (thus setting the condition for blue
paint),
The wording of the C standard makes it clear that the rest of the preprocessing
tokens in the file are *not* included. "If the name of the macro being replaced
is found during this scan of the replacement list (not including the rest of the
file's preprocessing tokens), it is not replaced. Furthermore, if any nested
replacements encounter the name of the macro being replaced it is not replaced."
Post by Douglas A. Gwyn
Post by Paul Mensonides
a ton of existing code requires that behavior.
Really? I find that surprising.
Really. The majority of it is in C++, but it exists in C as well.
Post by Douglas A. Gwyn
Post by Paul Mensonides
The viewpoint that macro expansion forms an invocation hierarchy is faulty which
leads to interpretations like this one.
The specification is explicitly recursive.
No, it is explicitly non-recursive (see below).
Post by Douglas A. Gwyn
Post by Paul Mensonides
expandB+ A (Barg)
| |^| |
| |_|______|
|__________| |
| A invocation (#1)
B context
expandB+ expandA()+B(Barg)
|^ |
|_________________|
|
A context
No, you dropped a context there. The original expansion
of B has not yet concluded, because part of the process
requires expansion of a nested macro A, which has not
quite concluded...
No, I didn't. The context no longer exists because the invocation is not nested
(see below). You are blatantly ignoring what the standard dictates because of a
typical view that macro expansion is like a procedural hierarchy. Yes, scanning
is not finished with the preprocessing tokens of B's replacement list, but is
finished with the *replacement* of the invocation of B. At this point, nothing
exists except contexts that produced the preprocessing tokens. Specifically,
there is no context that implicitly exists because of logical flow implied by
functional hierarchy. In the conceptual model defined by the standard, there is
no "token buffer". There is simply a stream of preprocessing tokens that makes
up the entire source file. Some of those preprocessing tokens get replaced by
replacement lists (followed by argument substitution and operator evaluation)
and then scanning resumes at the beginning of the replacement list with some
conceptual delimiter at the end of the tokens that made up the replacement list
that marks the point where the macro name becomes available again (i.e. will no
longer get painted). To illustrate...

#define A() B()
#define B() C()
#define C() 1

tok tok A() tok tok

...does not get processed as:

tok tok A() tok tok
|
B()
|
C()
|
1

Rather, it gets processed as:

tok tok A() tok tok
^
tok tok B() tok tok
|^ |
|___|
|
A context

tok tok C() tok tok
||^ ||
||___||
| | |
| B context
|_____|
|
A context

tok tok 1 tok tok
||| ^ |||
|||___|||
|| | ||
|| C context
||_____||
| | |
| B context
|_______|
|
A context

tok tok 1 tok tok
^

I'll try to be more clear. The preprocessing tokens before and (more
importantly) after any given invocation are always right there with the
invocation. They don't have to be "gotten" from some outer token stream when
needed--they are _already_ there in one steady stream of preprocessing tokens
that makes up all the preceding and trailing tokens to the end of the file.
(The only exception to this is argument expansion which occurs as an independent
stream of preprocessing tokens that acts exactly the same as the scheme above
except that the stream that makes up the argument's preprocessing tokens acts as
if it was the entire file.) In the above example, the relevant stream of tokens
is "tok tok A() tok tok". The A() invocation is replaced by A's replacement
list, yielding "tok tok B() tok tok" and so on. This is quite different than
the call-and-return model that constructs a functional hierarchy. In that
model, the invocation of A would get expanded all the way down to 1 which would
then recursively unroll, ultimately replacing the invocation tokens "A()" with
"1". That is not what the standard specifies. It specifies replacement first,
then rescanning, not rescanning followed by replacement. To put it in more core
language terms, it as if every invocation is physically inlined into the stream
of preprocessing tokens where the invocation existed (which is either the
top-level stream or an argument being processed as an separate stream).
Post by Douglas A. Gwyn
Post by Paul Mensonides
expandB+ expandA()+ B (Barg)
| | ^ | |
| |___| |
| | |
| B invocation (#2)
|_____________________|
|
A context
...until the A-replacement (sub)buffer is examined for
further nested macros to replace, *during* which
examination an occurrence of B (or of A) shall not be
replaced. The specification practically tracks this
very example step by step, so it is easy to apply.
No it doesn't. The invocation of B at this point is nested within the
rescanning of A's replacement list, but not B's.
Post by Douglas A. Gwyn
Note that the process described in 6.10.3.4 is one
*component* of the process 6.10.3, not something that
happens after 6.10.3 is complete.
Actually, it is the resumption of the iterative process:

(re)scanning -> replacement -> substitution -> #/## ->
^ |
|____________________________________________________|
Post by Douglas A. Gwyn
In fact the reference
to nested macros encountering the name being replaced
makes sense only with that understanding. I think what
may be throwing you is the use of the term "nested",
which you might think means geographically nested instead
of procedurally nested.
This is the exact model defined by the standard: invocation -> replacement ->
argument substitution -> token-pasting/stringizing -> rescanning. This is a
critical point: replacement happens in a distinct step prior to rescanning
(emphasis mine):

6.10.3/9 - "...that causes each subsequent instance of the macro name to be
***replaced by the replacement list of preprocessing tokens that constitute the
remainder of the directive***."

6.10.3/10 - "...that is ***replaced by the replacement list*** in the
definition..."

A macro invocation is replaced by the preprocessing tokens of the replacement
list, *not* by the rescanned (i.e. macro-expanded) preprocessing tokens of the
replacement list. Conceptually, this step happens even before argument
substitution. It is in-place iteration over the sequence of preprocessing
tokens that makes up a file. There is no procedural nesting, there is only
physical/geographical nesting designed to prevent macro expansion from looping
forever. The only occurrence of recursion in this model is that arguments to
macros are completely macro replaced separately. I.e. this can be considered
recursively processed:

#define ID(x) x

ID(ID(ID(1)))

but this is iteratively processed:

#define A(x) B(x)
#define B(x) C(x)
#define C(x) x

A(1)
Post by Douglas A. Gwyn
Unfortunately the term is used
colloquially with no further elucidation; however, there
is no clue that it should be thought of as geographic
nesting, and in its immediate context the only already
defined nesting process it could be referring to is the
logical recursive expansion.
The immediately preceding sentence says "not including the rest of the source
file's preprocessing tokens". That gives plenty of immediate context. "Logical
recursive expansion" is an assumption based on typical programming models, not
on the definition of macro expansion in the standard. That assumption
introduces a different conceptual model that happens to produce the same results
in all cases *except* when the invocation of a function-like macro spans the end
of a replacement list. At that point, the faulty, but familiar, conceptual
model derived from the assumption breaks, and you have to look at the actual
conceptual model defined by the standard--which is the antithesis of recursive
nesting.
Post by Douglas A. Gwyn
The only intentional ambiguity in this area is the one
referred to in the response to DR #017 (Question 9),
which refers to the situation only after *all* expansion
is complete, and a different situation than in this
example. Further, the specification makes it very clear
that blue paint is permanent, so there is no way for the
newly created "B" to ever trigger macro replacement.
Absolutely, blue paint is permanent--except of course if you create a new
preprocessing token via token-pasting (the paint is not transferred because the
token no longer exists):

#define CAT(a, b) PRIMITIVE_CAT(a, b)
#define PRIMITIVE_CAT(a, b) a ## b

#define A() A
#define AA A

CAT(A(), A()) // A (i.e. not painted)
CAT(A(), A())() // A' (i.e. painted)

;)

Regards,
Paul Mensonides
Paul Mensonides
2004-02-28 09:28:00 UTC
FYI, here is a link which discusses the original algorithm and intent from which
the standard text was derived:
http://gcc.gnu.org/ml/gcc-prs/2001-q1/msg00495.html

Note that the algorithm is not identical to the text of the standard. Rather
the text of the standard is simplified by breaking the procedural hierarchy mold
by removing the recursive model. Nevertheless, it is illustrative of the
original intent of the text of the standard.

Regards,
Paul Mensonides
Douglas A. Gwyn
2004-02-28 17:21:55 UTC
Post by Paul Mensonides
FYI, here is a link which discusses the original algorithm and intent from which
http://gcc.gnu.org/ml/gcc-prs/2001-q1/msg00495.html
Note that the algorithm is not identical to the text of the standard. Rather
the text of the standard is simplified by breaking the procedural hierarchy mold
by removing the recursive model. Nevertheless, it is illustrative of the
original intent of the text of the standard.
Actually I was one of the people who worked on the wording
for the standard, and I remember Dave's proposed model.
What I don't remember is the committee agreeing to the
intersection-of-hidesets approach. (Note that despite
what you said in your other note, it is explicitly a
rescursive model.) What we agreed to is the text that
you see printed in the standard. There is a tiny amount
of unspecified behavior (referred to in the previously
sited DR), but that doesn't apply to the current example,
and even less so to the one in the link you provided.
Douglas A. Gwyn
2004-02-28 17:34:34 UTC
Apologies for the typos,
which I don't think obscured my point(s).
Paul Mensonides
2004-03-01 05:13:19 UTC
Post by Douglas A. Gwyn
Post by Paul Mensonides
FYI, here is a link which discusses the original algorithm and intent from which
http://gcc.gnu.org/ml/gcc-prs/2001-q1/msg00495.html
Note that the algorithm is not identical to the text of the standard.
Rather
Post by Douglas A. Gwyn
Post by Paul Mensonides
the text of the standard is simplified by breaking the procedural hierarchy mold
by removing the recursive model. Nevertheless, it is illustrative of the
original intent of the text of the standard.
Actually I was one of the people who worked on the wording
for the standard, and I remember Dave's proposed model.
What I don't remember is the committee agreeing to the
intersection-of-hidesets approach. (Note that despite
what you said in your other note, it is explicitly a
rescursive model.)
Maybe you should read what I wrote when I included the link. The standard does
not follow the pseudo-algorithm directly, but it closes replicates the results.
The
standard text removes the recursive model in favor of a simpler iterative model.

Regards,
Paul Mensonides
s***@amu.edu.pl
2004-02-28 13:13:10 UTC
Let's first make meaning of certain words clear, to make sure
we understand each other, for the Standard uses them without
formal definition.
"macro name" = identifier after "#define" directive (6.10.3#9 and #10)
"macro invocation" = 6.10.3#10 uses it twice, first is ambiguous, second
use strictly determines this is a *noun* meaning "sequence of preprocessing
tokens" (not a verb meaning "process of collecting tokens"). I'll
extend its use for object-like macros (the sequence consists of one
token, ie. macro name).
"macro" = can't find any definition for the _word_, I assume this is
alias of "macro invocation" ("macro-name" for obj-like macros; and
"macro-name ( ... )" for fn-like macros, otherwise "macro-name othertoken"
is not a "macro", and is not considered for expansion)
"macro replacement" = verb, denoting process or replacing one sequence
of tokens by another
Post by Paul Mensonides
Post by Douglas A. Gwyn
Post by Paul Mensonides
This is incorrect. When a macro invocation spans the end of a replacement
list
Post by Douglas A. Gwyn
Post by Paul Mensonides
it is not necessarily considered nested.
The wording in the C standard makes it pretty clear that
the inner replacement occurs *during* the process of the
original replacement (thus setting the condition for blue
paint),
The wording of the C standard makes it clear that the rest of the preprocessing
tokens in the file are *not* included. "If the name of the macro being replaced
is found during this scan of the replacement list (not including the rest of the
file's preprocessing tokens), it is not replaced. Furthermore, if any nested
replacements encounter the name of the macro being replaced it is not replaced."
This is a little OT here, because the discussion is about what
"nested" means here. But this excerpt is vital here, so I'll rewrite it
here with my comments:

6.10.3.4
#1 [...]
Then, the resulting preprocessing token sequence is rescanned,
along with all subsequent preprocessing tokens of the source file
//this inclusion of the rest of the file makes sense only
//for fn-like macros, either to determine whether fn-like
//"macro-name" forms a macro (with subsequent lparen), or to complete
//fn-like macro argument list.
, for more macro names to replace.
//actually "macros" are replaced, not "macro names", this is
//probably a human language deficiency, but also maybe direct
//reference to next point, see futher
#2
If the name of the macro being replaced
//note: "name of the macro", not "macro" or "macro invocation";
//the "macro" is replaced, not "macro name"
is found during this scan
//ie. "rescan"
of the replacement list
//which resulted in previous replacement before "rescan"
(not including the rest of the source file's preprocessing tokens),
//this is redundant, it only adds emphasis to "the replacement list",
//ie. this and no further
it is not replaced.
Further, if any nested
//"nested" means replacements that occur during this rescan,
//cf. #1
replacements encounter the name of the macro
//again: "name of the macro", not "macro invocation"
being replaced, it
//"it" surely refers to "macro", not "name of the macro"
is not replaced. These nonreplaced [...]

Nowhere in this paragraph "Rescanning..." is the word "invocation"
even mentioned. Rescanning and invocation (verb) are two separate
procedures.
Post by Paul Mensonides
Post by Douglas A. Gwyn
Post by Paul Mensonides
The viewpoint that macro expansion forms an invocation hierarchy is faulty
which
Post by Douglas A. Gwyn
Post by Paul Mensonides
leads to interpretations like this one.
The specification is explicitly recursive.
No, it is explicitly non-recursive (see below).
The "macro expansion" algorithm may be not necessarily recursive,
but literaly the Standard _is_ recursive: rescanning and parameter
replacements make direct references to expantion (earlier in the
Standard), which possibly contains nested param expansions and is
followed by rescan, which ... etc.
Sometimes macro expansions are independent (parameter) and sometimes
they are performed in context of subsequent tokens (in file, or
nested parameter expantions within parameter expansions).
Post by Paul Mensonides
Post by Douglas A. Gwyn
Post by Paul Mensonides
expandB+ A (Barg)
| |^| |
| |_|______|
|__________| |
| A invocation (#1)
B context
expandB+ expandA()+B(Barg)
|^ |
|_________________|
|
A context
No, you dropped a context there. The original expansion
of B has not yet concluded, because part of the process
requires expansion of a nested macro A, which has not
quite concluded...
No, I didn't. The context no longer exists because the invocation is not nested
This is the crux of the problem: you say "invocation is not nested",
which obviously isn't, but the Standard doesn't ever mention "invocation"
during "rescan", but merely "nested replacement".

Here "macro name" A falls within the replacement procedure of B,
next we determine that A forms an invocation, thus we expand it
(ie. macro, but maybe this is why "macro name" is used in #1 - see above),
this replacement takes place within the context of B expansion,
because token A originated from there.
Post by Paul Mensonides
(see below). You are blatantly ignoring what the standard dictates because of a
typical view that macro expansion is like a procedural hierarchy. Yes, scanning
I guess noone here denies it, see further
Post by Paul Mensonides
tok tok 1 tok tok
||| ^ |||
|||___|||
|| | ||
|| C context
||_____||
| | |
| B context
|_______|
|
A context
tok tok 1 tok tok
^
Fully agreed.
Post by Paul Mensonides
Post by Douglas A. Gwyn
Unfortunately the term is used
colloquially with no further elucidation; however, there
is no clue that it should be thought of as geographic
nesting, and in its immediate context the only already
defined nesting process it could be referring to is the
logical recursive expansion.
I think what Douglas meant by "geographic nesting" is "syntactical
nesting" (as in C syntax), which is not taking place in macro expansion;
and by "procedurally nested" (earlier) he means 'replacement procedure'.
I think we all agree here.

Paul:
When I started the editor, my intention was to agree with you, and to back
up your argument I wanted to bring that excerpt where it says "nested
invocation", but I din't find it. If the Standard said in 6.10.3.4#2
"if any nested replacements encounter the invocation of the macro
being replaced", I would definitely say you were right, because
in the above example "A invocation" is not nested; but it doesn't.
Please, don't use the word "invocation" in context of "rescan",
because the Standard doesn't do so, and that has already confused
me a lot. IMO the use of "invocation" is only valid in terms of
"macro replacement", ie. to determine whether function-like macro-name
(token) forms a "macro" ("macro invocation").

I have taken notice of the link you gave in the followup to your
article (I'll yet have to read it more closely later). The expansion
of G_0(42) definitely disagrees with my reasoning and, what I think,
what the Standard says. Okay, maybe the intention was different,
by Dave Prosser's explanations; then maybe the Standard has a flaw
there and needs correcting. Only what do we do until then?
--
Stan Tobias
Douglas A. Gwyn
2004-02-28 17:27:15 UTC
Post by s***@amu.edu.pl
Only what do we do until then?
My suggestion is to not use macro definitions
that are meant to snarf additional PP tokens
from what follows *if* a potential recursion
could occur. I suppose the most likely instance
would be using an *object-like* macro to replace
the *name* of a function, except that the name
is really that of a function-like macro, whose
arguments might accidentally contain calls to
the first "function", wherein lies the recursion.
My answer to that is, don't do it. If you want
to use a macro to replace a function, replace
the argument list as well (by using a function-
like macro instead of an object-like macro).
Paul Mensonides
2004-03-01 05:13:20 UTC
Post by s***@amu.edu.pl
Post by Paul Mensonides
The wording of the C standard makes it clear that the rest of the preprocessing
tokens in the file are *not* included. "If the name of the macro being replaced
is found during this scan of the replacement list (not including the rest of the
file's preprocessing tokens), it is not replaced. Furthermore, if any nested
replacements encounter the name of the macro being replaced it is not replaced."
This is a little OT here, because the discussion is about what
"nested" means here. But this excerpt is vital here, so I'll rewrite it
6.10.3.4
#1 [...]
Then, the resulting preprocessing token sequence is rescanned,
along with all subsequent preprocessing tokens of the source file
//this inclusion of the rest of the file makes sense only
//for fn-like macros, either to determine whether fn-like
//"macro-name" forms a macro (with subsequent lparen), or to complete
//fn-like macro argument list.
More specifically, the inclusion of the rest of the file makes sense only for
fn-like macro invocations that span the end of the replacement list of some
other macro (function-like or object-like).

Note also that this text is the literal definition of non-recursion. It is
resumption of scanning, not a recursive scan that continues until the end of the
file.
Post by s***@amu.edu.pl
, for more macro names to replace.
//actually "macros" are replaced, not "macro names", this is
//probably a human language deficiency, but also maybe direct
//reference to next point, see futher
#2
If the name of the macro being replaced
//note: "name of the macro", not "macro" or "macro invocation";
//the "macro" is replaced, not "macro name"
Actually, "macro invocations" are replaced. A "macro" is a reference to the
entity in a symbol table that is produced by a macro definition, e.g.

#define MACRO() // ...

A "macro name" is just the name of the macro, e.g.. MACRO here. The standard
does not use these terms consistantly, but it is nevertheless clear where it
counts.
Post by s***@amu.edu.pl
is found during this scan
//ie. "rescan"
Yes.
Post by s***@amu.edu.pl
of the replacement list
//which resulted in previous replacement before "rescan"
(not including the rest of the source file's preprocessing tokens),
//this is redundant, it only adds emphasis to "the replacement list",
//ie. this and no further
It is explanatory. At this point, the preprocessing tokens that made up the
macro invocation have already been replaced by the replacement list of the
macro. Scanning (a.k.a. rescanning) resumes at the beginning of the
preprocessing tokens that made up the replacement list from which they came.
The above text is a clarification that the macro name (if found) will no longer
be disabled (a.k.a. painted blue) if found after that point.
Post by s***@amu.edu.pl
it is not replaced.
Further, if any nested
//"nested" means replacements that occur during this rescan,
//cf. #1
replacements encounter the name of the macro
//again: "name of the macro", not "macro invocation"
being replaced, it
//"it" surely refers to "macro", not "name of the macro"
is not replaced. These nonreplaced [...]
Nowhere in this paragraph "Rescanning..." is the word "invocation"
even mentioned. Rescanning and invocation (verb) are two separate
procedures.
Yes, but scanning causes replacement (if macro invocations are found) which in
turn causes (re)scanning which causes more replacements and so on for the rest
of the file. What this means is that rescanning is just resumption of scanning.
The only difference is that another disabling context exists that stretches from
the point where scanning resumes until the last preprocessing token that came
from the replacement list. This is iterative, not recursive.
Post by s***@amu.edu.pl
Post by Paul Mensonides
Post by Douglas A. Gwyn
Post by Paul Mensonides
The viewpoint that macro expansion forms an invocation hierarchy is faulty
which
Post by Douglas A. Gwyn
Post by Paul Mensonides
leads to interpretations like this one.
The specification is explicitly recursive.
No, it is explicitly non-recursive (see below).
The "macro expansion" algorithm may be not necessarily recursive,
but literaly the Standard _is_ recursive: rescanning and parameter
replacements make direct references to expantion (earlier in the
Standard), which possibly contains nested param expansions and is
followed by rescan, which ... etc.
No, a macro expansion algorithm may be recursive, but the standard is literally
iterative. It would be recursive (i.e. form a functional hierarchy) if and only
if the replacement list of a macro was fully macro-replaced before it replaced
the invocation of the macro--but that is not what happens. The standard
*literally* says that replacement happens *first*, followed by argument
substitution, followed by stringizing and token-pasting, and ultimately followed
by the resumption of scanning.
Post by s***@amu.edu.pl
Sometimes macro expansions are independent (parameter) and sometimes
they are performed in context of subsequent tokens (in file, or
nested parameter expantions within parameter expansions).
They are always performed in the context of subsequent tokens. With arguments,
there just aren't any more after the end of the argument. I.e.

#define A() B
#define B() 1

#define SCAN(x) x

SCAN( A()() ) // 1
Post by s***@amu.edu.pl
Post by Paul Mensonides
Post by Douglas A. Gwyn
Post by Paul Mensonides
expandB+ A (Barg)
| |^| |
| |_|______|
|__________| |
| A invocation (#1)
B context
expandB+ expandA()+B(Barg)
|^ |
|_________________|
|
A context
No, you dropped a context there. The original expansion
of B has not yet concluded, because part of the process
requires expansion of a nested macro A, which has not
quite concluded...
No, I didn't. The context no longer exists because the invocation is not nested
This is the crux of the problem: you say "invocation is not nested",
which obviously isn't, but the Standard doesn't ever mention "invocation"
during "rescan", but merely "nested replacement".
Yes, but replacement is the result of invocation (and only invocation). The
standard makes this part (at least) clear. As you say, this part is only
relevant to function-like macro invocations, because only they can span the end
of the replacement list. If A in the above was an object-like macro instead of
a function-like macro, the token A by itself would constitute the entire
invocation which would indeed be nested.
Post by s***@amu.edu.pl
Here "macro name" A falls within the replacement procedure of B,
next we determine that A forms an invocation, thus we expand it
(ie. macro, but maybe this is why "macro name" is used in #1 - see above),
this replacement takes place within the context of B expansion,
because token A originated from there.
No. The token A does not constitute a macro invocation. Only A immediately
followed by an ( constitutes the beginning of a macro invocation (as defined
previously). That is the minimum that is required to say that it is an
invocation.
Post by s***@amu.edu.pl
Post by Paul Mensonides
tok tok 1 tok tok
||| ^ |||
|||___|||
|| | ||
|| C context
||_____||
| | |
| B context
|_______|
|
A context
tok tok 1 tok tok
^
Fully agreed.
If so, how can you say it is recursive? That behavior is iterative.
Post by s***@amu.edu.pl
Post by Paul Mensonides
Post by Douglas A. Gwyn
Unfortunately the term is used
colloquially with no further elucidation; however, there
is no clue that it should be thought of as geographic
nesting, and in its immediate context the only already
defined nesting process it could be referring to is the
logical recursive expansion.
I think what Douglas meant by "geographic nesting" is "syntactical
nesting" (as in C syntax), which is not taking place in macro expansion;
and by "procedurally nested" (earlier) he means 'replacement procedure'.
I think we all agree here.
No, we definitely do not. I agree with the terminology, and I know what the
difference is between geographic/syntactic nesting and procedural nesting. The
standard model is *absolutely* not procedural nesting. It is an iterative loop
that begins with the replacement of a macro invocation with the preprocessing
tokens of the macro's replacement list, followed by argument substitution,
followed by stringizing/token-pasting, and finally followed by the resumption of
scanning. In the following example,

#define A() B()
#define B() 1

A()

The invocation of B() does not "return" to A() which in turn returns to the
top-level. That is procedural nesting.

Instead, A() immediately returns the tokens "B()" to the top-level and scanning
resumes, which causes the expansion of B() which in turn immediately returns "1"
to the top-level and scanning resumes (which is a no-op on "1"). That is
iterative, not procedural, and is *by definition of the standard* regardless of
the familiar notion of procedural nesting and how much one wishes for that to be
the case.
Post by s***@amu.edu.pl
When I started the editor, my intention was to agree with you, and to back
up your argument I wanted to bring that excerpt where it says "nested
invocation", but I din't find it. If the Standard said in 6.10.3.4#2
"if any nested replacements encounter the invocation of the macro
being replaced", I would definitely say you were right, because
in the above example "A invocation" is not nested; but it doesn't.
I never said or implied that at all. All that is required is for a single
identifier preprocessing token that names a macro to exist in a context that
causes painting. If that happens, the single identifier preprocessing token
gets permanently painted. No "attempted" invocation is necessary to cause this
painting. Nevertheless, a disabling context *is* required. Hence, in the
following example:

#define A() A

A()() // A'()

The token A does get painted, even though no nested invocation of A is
attempted. At the same time,

#define A() B
#define B() A

A()() // A

The token A does not get painted, because it is the result of an invocation that
is not nested within the context that disables A.
Post by s***@amu.edu.pl
Please, don't use the word "invocation" in context of "rescan",
because the Standard doesn't do so, and that has already confused
me a lot.
The act of a macro replacement is the consequence of a macro invocation
(especially with a function-like macro which is the only applicable kind of
macro in the context that we've been discussing). The standard doesn't use the
terminology consistently, but it is still clear.
Post by s***@amu.edu.pl
IMO the use of "invocation" is only valid in terms of
"macro replacement", ie. to determine whether function-like macro-name
(token) forms a "macro" ("macro invocation").
I have taken notice of the link you gave in the followup to your
article (I'll yet have to read it more closely later). The expansion
of G_0(42) definitely disagrees with my reasoning and, what I think,
what the Standard says. Okay, maybe the intention was different,
by Dave Prosser's explanations; then maybe the Standard has a flaw
there and needs correcting. Only what do we do until then?
I personally think that the standard is clear (or, at least, clear enough). It
could certainly be spelled out better. Note that in the Annex J of C99 (which
is non-normative) there is a reference to this issue that says that it is
unspecified. This was a result of an old DR. The resulting text is either a
failure to adhere (or remember) the original intent of the standard text which
was derived from that pseudo-algorithm or a disregard of it for whatever
reasons. There is a similar C++ DR (DR #268) that is the result of the
discussion with Dave Prosser. It has a conclusion that agrees with me.

Regards,
Paul Mensonides
Douglas A. Gwyn
2004-03-01 06:20:37 UTC
Post by Paul Mensonides
No, a macro expansion algorithm may be recursive, but the standard is literally
iterative. It would be recursive (i.e. form a functional hierarchy) if and only
if the replacement list of a macro was fully macro-replaced before it replaced
the invocation of the macro--but that is not what happens. The standard
*literally* says that replacement happens *first*, followed by argument
substitution, followed by stringizing and token-pasting, and ultimately followed
by the resumption of scanning.
Each *argument* is fully macro expanded before substitution,
which is as recursive as anything ever gets. Then during
macro expansion (6.10.3) there is the possibility of
triggering further, nested macro expansions (within 6.10.3.4).
During those expansions, context is inherited and is made use
of, recursively.
Post by Paul Mensonides
standard model is *absolutely* not procedural nesting. It is an iterative loop
that begins with the replacement of a macro invocation with the preprocessing
tokens of the macro's replacement list, followed by argument substitution,
followed by stringizing/token-pasting, and finally followed by the resumption of
scanning.
No. The rescanning is not a resumption of the top-level scan.
If it were, then logically there would be no meaning in 6.10.3.4
paragraph 2. That paragraph makes sense only in the limited
context of the ongoing macro replacement process, not upon a
pop back up to the top-level scan.
Paul Mensonides
2004-03-04 14:30:59 UTC
Post by Douglas A. Gwyn
Post by Paul Mensonides
No, a macro expansion algorithm may be recursive, but the standard is literally
iterative. It would be recursive (i.e. form a functional hierarchy) if and only
if the replacement list of a macro was fully macro-replaced before it replaced
the invocation of the macro--but that is not what happens. The standard
*literally* says that replacement happens *first*, followed by argument
substitution, followed by stringizing and token-pasting, and ultimately followed
by the resumption of scanning.
Each *argument* is fully macro expanded before substitution,
which is as recursive as anything ever gets.
A recursive scan, not a recursive macro replacement. That recursive scan need
not even include macro replacement at all, and it can include many that aren't
nested in any possible way. The recursion is that of the entire scanning
process that replaces macros, not a recursion into a macro expansion. Only the
second forms a procedural hierarchy of macro invocation. The first is merely a
(possible and likely) recursive invocation of the algorithm that scans
preprocessing tokens for macros to replace.
Post by Douglas A. Gwyn
Then during
macro expansion (6.10.3) there is the possibility of
triggering further, nested macro expansions (within 6.10.3.4).
During those expansions, context is inherited and is made use
of, recursively.
In a sense, yes, but nesting of context does not denote recursion any more than
a loop with a changing state does. Context that was created by previous action
does not imply recursion.
Post by Douglas A. Gwyn
Post by Paul Mensonides
standard model is *absolutely* not procedural nesting. It is an iterative loop
that begins with the replacement of a macro invocation with the preprocessing
tokens of the macro's replacement list, followed by argument substitution,
followed by stringizing/token-pasting, and finally followed by the resumption of
scanning.
No. The rescanning is not a resumption of the top-level scan.
The standard explicitly says that the rescan includes all of the rest of the
preprocessing tokens of the file. It does *not* say that they are "available"
if needed to finish some macro invocation. It is definitely resumption, which
is why the second paragraph explicitly has to remove the rest of the source
file's preprocessing tokens in the creation of disabling contexts.
Post by Douglas A. Gwyn
If it were, then logically there would be no meaning in 6.10.3.4
paragraph 2. That paragraph makes sense only in the limited
context of the ongoing macro replacement process, not upon a
pop back up to the top-level scan.
Only if you consider macro invocation to form a procedural hierarchy, which it
does not. The nesting is not procedural, it is contextual, and as such the
paragraph makes perfect sense.

Regards,
Paul Mensonides
s***@amu.edu.pl
2004-03-01 18:27:16 UTC
Post by Paul Mensonides
Post by Douglas A. Gwyn
6.10.3.4
#1 [...]
Then, the resulting preprocessing token sequence is rescanned,
along with all subsequent preprocessing tokens of the source file
//this inclusion of the rest of the file makes sense only
//for fn-like macros, either to determine whether fn-like
//"macro-name" forms a macro (with subsequent lparen), or to complete
//fn-like macro argument list.
More specifically, the inclusion of the rest of the file makes sense only for
fn-like macro invocations that span the end of the replacement list of some
other macro (function-like or object-like).
ACK
Post by Paul Mensonides
Note also that this text is the literal definition of non-recursion. It is
resumption of scanning, not a recursive scan that continues until the end of the
file.
I disagree here, see below.
Post by Paul Mensonides
Post by Douglas A. Gwyn
, for more macro names to replace.
//actually "macros" are replaced, not "macro names", this is
//probably a human language deficiency, but also maybe direct
//reference to next point, see futher
#2
If the name of the macro being replaced
//note: "name of the macro", not "macro" or "macro invocation";
//the "macro" is replaced, not "macro name"
Actually, "macro invocations" are replaced. A "macro" is a reference to the
entity in a symbol table that is produced by a macro definition, e.g.
#define MACRO() // ...
A "macro name" is just the name of the macro, e.g.. MACRO here. The standard
does not use these terms consistantly, but it is nevertheless clear where it
counts.
I do use "macro" in sense of "macro invocation", I have defined that
earlier in my previous post, for the needs of the discussion.
Post by Paul Mensonides
Post by Douglas A. Gwyn
is found during this scan
//ie. "rescan"
Note here: "this scan"; see below.
Post by Paul Mensonides
Yes.
Post by Douglas A. Gwyn
of the replacement list
//which resulted in previous replacement before "rescan"
(not including the rest of the source file's preprocessing tokens),
//this is redundant, it only adds emphasis to "the replacement list",
//ie. this and no further
It is explanatory. At this point, the preprocessing tokens that made up the
macro invocation have already been replaced by the replacement list of the
macro. Scanning (a.k.a. rescanning) resumes at the beginning of the
Disagree to "a.k.a"; see below.
Post by Paul Mensonides
preprocessing tokens that made up the replacement list from which they came.
The above text is a clarification that the macro name (if found) will no longer
be disabled (a.k.a. painted blue) if found after that point.
Post by Douglas A. Gwyn
it is not replaced.
Further, if any nested
//"nested" means replacements that occur during this rescan,
//cf. #1
replacements encounter the name of the macro
//again: "name of the macro", not "macro invocation"
being replaced, it
//"it" surely refers to "macro", not "name of the macro"
is not replaced. These nonreplaced [...]
Nowhere in this paragraph "Rescanning..." is the word "invocation"
even mentioned. Rescanning and invocation (verb) are two separate
procedures.
Yes, but scanning causes replacement (if macro invocations are found) which in
turn causes (re)scanning which causes more replacements and so on for the rest
of the file. What this means is that rescanning is just resumption of scanning.
The only difference is that another disabling context exists that stretches from
the point where scanning resumes until the last preprocessing token that came
from the replacement list. This is iterative, not recursive.
Post by Douglas A. Gwyn
The "macro expansion" algorithm may be not necessarily recursive,
but literaly the Standard _is_ recursive: rescanning and parameter
replacements make direct references to expantion (earlier in the
Standard), which possibly contains nested param expansions and is
followed by rescan, which ... etc.
No, a macro expansion algorithm may be recursive, but the standard is literally
iterative. It would be recursive (i.e. form a functional hierarchy) if and only
if the replacement list of a macro was fully macro-replaced before it replaced
the invocation of the macro--but that is not what happens. The standard
ACK. In that sense it is not _fully_ recursive.
Post by Paul Mensonides
*literally* says that replacement happens *first*, followed by argument
substitution, followed by stringizing and token-pasting, and ultimately followed
by the resumption of scanning.
Below I'm going to explain two things: 1. why I think "rescan" is
different than "scan", and 2. that the procedure is (not strictly)
"recursive".

1. The standard doesn't formally introduce the term or any notion of
"rescan", but I have reasons to believe it is different than "scan".
(In context of the Standard I understand this term as "rescan",
not re-"scan").
First, 6.10.3.4#2 uses words "during this scan", which suggests
there is an "other scan" - in context of that chapter it means
the previous scanning phase.
Second, the first mention and description of replacement process
starts in 6.10.3 #9 and #10 (with the macro definitions).
Third, as a sign of good will I might agree, that 6.10.3 is only
an introduction, and the actual algorithm is described by
sections 6.10.3.?, but then the Standard would tell somwhere
before 6.10.3.1 that it starts at 6.10.3.4; we surely don't
start preprocessing a file from argument substitution :) .

2. The "Rescan" chapter never suggests that anything is
"resumed" (or "continued", or anything alike). Since "rescan"
and "scan" are not the same, by saying that "scan resumes",
it would mean to leave the whole replaced list and continue
reading the file, which would make that chapter invalid - utter
nonsense in context of the whole section.
The macro expansion within argument substitution is strictly
recursive (arguments are expanded in isolation) and this
is described as "are completely macro replaced" (6.10.3.1#1);
what reason would I have to believe that "for more macro names
to replace" (6.10.3.4#1) - without any further comments - is
something different?

I think that the described _procedure_ of scan/rescan is recursive
(it need not be implemented as such), at least in sense of
keeping a stack of nested contexts (disabled macros). The
token processing is iterative (I'd say, token stream is a global
entity on which all recursive scan levels work - except args, which
constitute separate token streams).
Post by Paul Mensonides
Post by Douglas A. Gwyn
Post by Paul Mensonides
Post by Douglas A. Gwyn
Post by Paul Mensonides
expandB+ A (Barg)
| |^| |
| |_|______|
|__________| |
| A invocation (#1)
B context
expandB+ expandA()+B(Barg)
|^ |
|_________________|
|
A context
No, you dropped a context there. The original expansion
of B has not yet concluded, because part of the process
requires expansion of a nested macro A, which has not
quite concluded...
No, I didn't. The context no longer exists because the invocation is not
nested
Post by Douglas A. Gwyn
This is the crux of the problem: you say "invocation is not nested",
which obviously isn't, but the Standard doesn't ever mention "invocation"
during "rescan", but merely "nested replacement".
Yes, but replacement is the result of invocation (and only invocation). The
standard makes this part (at least) clear. As you say, this part is only
relevant to function-like macro invocations, because only they can span the end
of the replacement list. If A in the above was an object-like macro instead of
a function-like macro, the token A by itself would constitute the entire
invocation which would indeed be nested.
Post by Douglas A. Gwyn
Here "macro name" A falls within the replacement procedure of B,
next we determine that A forms an invocation, thus we expand it
(ie. macro, but maybe this is why "macro name" is used in #1 - see above),
this replacement takes place within the context of B expansion,
because token A originated from there.
No. The token A does not constitute a macro invocation. Only A immediately
followed by an ( constitutes the beginning of a macro invocation (as defined
previously). That is the minimum that is required to say that it is an
invocation.
But the standard does not mention "invocation", but
merely: "[...] for more macro names to replace" (6.10.3.4#1).
Macro names are not replaced, it wouldn't make sense, but it
suggests that mere occurence of the "macro name" within
some context makes its expansion in that context.

For the second thing, function-like *macro-name* "[...] followed by a (
as the next preprocessing token introduces [...]" (6.10.3#10) macro
invocation, not: "(" that is preceded by fn-like macro name, nor:
")" that has backwards maching "(" preceded by fn-like macro name.
(The last one would probably be closest to D.Prosser's algorithm
understanding, but it wouldn't agree with iterative processing.)
Nor both tokens, nor all (in this sense "invocation" becomes
more an idea, rather that strictly "physical" sequence of tokens).

During scanning, when I find fn-macro-name, I read forward (analyze)
one token to check for "("; if I find one, I read forward until
the closing ")" to establish the argument list and invocation
(sequence) span; the main pointer always *stays* at "macro-name".
Then the whole invocation is substituted, but since main pointer
lies in a context, the whole replacement takes place therein.
In the example above A(...) will be expanded nested in "context B"
(provided the macro-name is outside any "context A"; otherwise
it gets painted blue and we procede on to next token, ie "(").
The Standard is worded in terms of macro-names, and never mentions
that the whole invocation (sequence) must lie within a context
for it to apply.
The argument list *belongs* to the fn-macro-name,
for it was emphesised in the definition of the invocation,
and because it fits in how "Rescan" section describes it
("[...] macro names to replace").
Post by Paul Mensonides
Post by Douglas A. Gwyn
Post by Paul Mensonides
tok tok 1 tok tok
||| ^ |||
|||___|||
|| | ||
|| C context
||_____||
| | |
| B context
|_______|
|
A context
tok tok 1 tok tok
^
Fully agreed.
If so, how can you say it is recursive? That behavior is iterative.
Because I think this shows a hierachy of nested scans/rescans.
Only the reading and processing is iterative.
I explained everything above.
Post by Paul Mensonides
Post by Douglas A. Gwyn
When I started the editor, my intention was to agree with you, and to back
up your argument I wanted to bring that excerpt where it says "nested
invocation", but I din't find it. If the Standard said in 6.10.3.4#2
"if any nested replacements encounter the invocation of the macro
being replaced", I would definitely say you were right, because
in the above example "A invocation" is not nested; but it doesn't.
I never said or implied that at all. All that is required is for a single
Sorry..., my mind was tired then, and I realised later I was writing nonsense.

What I wanted to say is: If the Standard said in 6.10.3.4#1
"[...] for more macro invocations to replace." (this of course
refers to "the resulting token sequence"), I would agree etc...

Sorry again for introducing unnecessary "noise".
Post by Paul Mensonides
I personally think that the standard is clear (or, at least, clear enough). It
Well, the link you've given before (with D.Prosser's algo) suggests
I'm not the only one to have this "alternative understanding".
Since other respectable people interpret the Standard different
ways, then maybe it is still not "clear enough".
Post by Paul Mensonides
could certainly be spelled out better. Note that in the Annex J of C99 (which
is non-normative) there is a reference to this issue that says that it is
Thanks. Actually I'm using n2794.txt downloaded recently and
I found this in Annex K.


Look, by now I'm certain that you're most probably "right", in sense
how David Prosser described "almost all implementations".
For me this is a dry (and extremely tiring) discussion,
because I know I represent the "wrong" side, because the intentions
were different.
All I'm doing is I'm acting as a lawyer and trying to establish
what the Standard actually says, regardless of any intentions.
I hope I can be "trusted", because I'm a fresh reader.
I'm trying to show, that maybe the Standard does not properly
reflect the authors' intentions.
The only value I see in this discussion is that in order to correct,
first you have to know what is wrong.
--
Best regards,
Stan Tobias
Douglas A. Gwyn
2004-03-02 04:14:30 UTC
Post by s***@amu.edu.pl
Since other respectable people interpret the Standard different
ways, then maybe it is still not "clear enough".
I think that after all the discussion, including the one I
alluded to that took place at the last Kona WG14 meeting,
the actual wording in the standard has about the same
degree of imperfect fit to either model, and it is
reasonable to interpret it either way. The practical
difference is slight, involving only one construct which
is dubious (both in its correct handling and in whether
its use should be encouraged in the first place). My
recommendation is to consider that implementations could
go either way, and therefore to avoid using such a
construct. An alternative would be to deem the matter so
important to resolve as to file a DR in order to get an
official ruling. But there is the danger that the ruling
might nail down the behavior in the opposite way from what
you would prefer.
Francis Glassborow
2004-03-02 10:10:46 UTC
Post by Douglas A. Gwyn
Post by s***@amu.edu.pl
Since other respectable people interpret the Standard different
ways, then maybe it is still not "clear enough".
I think that after all the discussion, including the one I
alluded to that took place at the last Kona WG14 meeting,
the actual wording in the standard has about the same
degree of imperfect fit to either model, and it is
reasonable to interpret it either way. The practical
difference is slight, involving only one construct which
is dubious (both in its correct handling and in whether
its use should be encouraged in the first place). My
recommendation is to consider that implementations could
go either way, and therefore to avoid using such a
construct. An alternative would be to deem the matter so
important to resolve as to file a DR in order to get an
official ruling. But there is the danger that the ruling
might nail down the behavior in the opposite way from what
you would prefer.
Without attempting to pre-judge the issue, I think your statement above
makes it clear that you think that the Standard is ambiguous. We have
several options.

1) Make it unspecified -- does not seem very desirable
2) Make it implementation defined -- ditto
3) Remove the ambiguity
4) Decide it is not ambiguous :-)

I do not think that do nothing is a legitimate response after deciding
that something is ambiguous.
--
Francis Glassborow ACCU
Author of 'You Can Do It!' see http://www.spellen.org/youcandoit
For project ideas and contributions: http://www.spellen.org/youcandoit/projects
l***@ugsplm.com
2004-03-02 17:13:00 UTC
Post by Francis Glassborow
I do not think that do nothing is a legitimate response after deciding
that something is ambiguous.
If the standard is ambiguous, then the behavior is unspecified:
implementations can behave either way and strictly conforming programs
cannot depend on a particular behavior. While it might be nice to
explicitly note the lack of specification, I don't see it as a
necessity.

-Larry Jones

I'll be a hulking, surly teen-ager before you know it!! -- Calvin
Douglas A. Gwyn
2004-03-02 19:08:01 UTC
Post by l***@ugsplm.com
implementations can behave either way and strictly conforming programs
cannot depend on a particular behavior. While it might be nice to
explicitly note the lack of specification, I don't see it as a
necessity.
Indeed in response to an earlier (C90) DR about macro processing,
we stated that a particular detail was intentionally ambiguous.
The problem with choosing to make a particular interpretation
the unambiguously "right" one is that it impacts programs and
implementations that have hitherto been depending on the other
interpretation. We can of course decide to do that, but the
benefit of doing so needs to be shown to outweigh the potential
damage.
Paul Mensonides
2004-03-04 14:30:58 UTC
Post by s***@amu.edu.pl
Post by Paul Mensonides
Note also that this text is the literal definition of non-recursion. It is
resumption of scanning, not a recursive scan that continues until the end of the
file.
I disagree here, see below.
Okay, but then you disagree with any form of literal interpretation of this
sentence:

"Then, the resulting preprocessing token sequence is rescanned, along with all
subsequent preprocessing tokens of the source file, for more macro names to
replace."
Post by s***@amu.edu.pl
Post by Paul Mensonides
Post by s***@amu.edu.pl
, for more macro names to replace.
//actually "macros" are replaced, not "macro names", this is
//probably a human language deficiency, but also maybe direct
//reference to next point, see futher
#2
If the name of the macro being replaced
//note: "name of the macro", not "macro" or "macro invocation";
//the "macro" is replaced, not "macro name"
Actually, "macro invocations" are replaced. A "macro" is a reference to the
entity in a symbol table that is produced by a macro definition, e.g.
#define MACRO() // ...
A "macro name" is just the name of the macro, e.g.. MACRO here. The standard
does not use these terms consistantly, but it is nevertheless clear where it
counts.
I do use "macro" in sense of "macro invocation", I have defined that
earlier in my previous post, for the needs of the discussion.
Your definition is unacceptable. The standard clearly says:

A preprocessing directive of the form

# define identifier replacement-list new-line

defines an object-like macro that causes...
^^^^^

-and-

A preprocessing directive of the form

# define identifier /* etc. */

defines a function-like macro with
^^^^^

I don't mean to be rude, but it is not up to you to redefine terminology that is
contrary to what everyone already uses "for the purposes of this discussion."
You are right that the standard does not use all the terminology consistently,
but nevertheless it is understandable with regard to meanings such as this. A
"macro" is an abstract "entity" that causes "macro invocations" to be replaced.
The meaning of "macro invocations" is unequivocal for function-like macros, and
is logically extended to object-like macros (even though the standard does not
use that terminology). A "macro name" is a symbol that identifies a macro--i.e.
an identifier. For an object-like macro, an instance of the macro name alone
constitutes a macro invocation. For a function-like macro, an instance of the
macro name followed by a left parenthesis is the minimum required to call it an
invocation and a full invocation requires a right parenthesis as well as the
intervening arguments (if any). A macro invocation in turn causes "macro
replacement", also known as "macro expansion".

I'm pretty sure that nobody is debating this terminology here but you. The
standard is imperfect, but if you don't take things with a "grain of salt," then
the paragraph about rescanning cannot include any nested replacements except
that of macro names with refer to object-like macros, since there is no other
semantic action called "replacement" specified for anything else.
Post by s***@amu.edu.pl
Post by Paul Mensonides
Post by s***@amu.edu.pl
is found during this scan
//ie. "rescan"
Note here: "this scan"; see below.
Yes, I know. This is related to you're other post about (re)examination, so
I'll address it there.
Post by s***@amu.edu.pl
Post by Paul Mensonides
Yes.
Post by s***@amu.edu.pl
of the replacement list
//which resulted in previous replacement before "rescan"
(not including the rest of the source file's preprocessing tokens),
//this is redundant, it only adds emphasis to "the replacement list",
//ie. this and no further
It is explanatory. At this point, the preprocessing tokens that made up the
macro invocation have already been replaced by the replacement list of the
macro. Scanning (a.k.a. rescanning) resumes at the beginning of the
Disagree to "a.k.a"; see below.
The standard quite literally calls it "rescanning" and then immediately turns
around and calls it "scan." The usage of "rescan" is the normal English usage.
It simply means "scan again". It is called rescanning because the preprocessing
tokens of the replacement list where already conceptually scanned several times
(during the definition of the macro, during argument substitution, during
evaluation of the # and # operators, and during the removal of placeholders).
As such, "rescan" is used as a normal word, not as a term that is semantically
distinct from "scan".
Post by s***@amu.edu.pl
Post by Paul Mensonides
No, a macro expansion algorithm may be recursive, but the standard is literally
iterative. It would be recursive (i.e. form a functional hierarchy) if and only
if the replacement list of a macro was fully macro-replaced before it replaced
the invocation of the macro--but that is not what happens. The standard
ACK. In that sense it is not _fully_ recursive.
Macro replacement is not a recursive process at all. The process of scanning
for macro invocations to replace is recursive only with actual parameters to
function-like macros--that recursive scanning may not even include macro
replacement at all,

#define A(x) x

A(+) // recursive scan on +, which is a no-op

and it might include more than one macro replacement in a single recursion of
this scanning process:

#define B() 1

A(B() B())

This is a single recursion of "the act of scanning for macro invocations to
replace."
Post by s***@amu.edu.pl
Post by Paul Mensonides
*literally* says that replacement happens *first*, followed by argument
substitution, followed by stringizing and token-pasting, and ultimately followed
by the resumption of scanning.
Below I'm going to explain two things: 1. why I think "rescan" is
different than "scan", and 2. that the procedure is (not strictly)
"recursive".
1. The standard doesn't formally introduce the term or any notion of
"rescan", but I have reasons to believe it is different than "scan".
(In context of the Standard I understand this term as "rescan",
not re-"scan").
First, 6.10.3.4#2 uses words "during this scan", which suggests
there is an "other scan" - in context of that chapter it means
the previous scanning phase.
No, I explained one such example of this already in the other post about
"(re)examination". (I'll return to that thread after I finish with this one.)
Otherwise, there are all sorts of kinds of scanning involved: scanning for
macros invocations to replace, scanning for references to formal parameters in
the replacement list of a macro, scanning for # and ## operators to evaluate,
and scanning to remove placemarker preprocessing tokens.
Post by s***@amu.edu.pl
Second, the first mention and description of replacement process
starts in 6.10.3 #9 and #10 (with the macro definitions).
Third, as a sign of good will I might agree, that 6.10.3 is only
an introduction, and the actual algorithm is described by
sections 6.10.3.?, but then the Standard would tell somwhere
before 6.10.3.1 that it starts at 6.10.3.4; we surely don't
start preprocessing a file from argument substitution :) .
:)
Post by s***@amu.edu.pl
2. The "Rescan" chapter never suggests that anything is
"resumed" (or "continued", or anything alike).
It definitely does suggest it, though it doesn't say it explicitly (because it
doesn't need
to). It says that after actual parameters are substituted for formal parameters
(etc.), the resulting sequence of preprocessing tokens is rescanned along with
all the other preprocessing tokens of the file. That is one of two things: 1)
resumption of the scanning for macro invocations to replace or 2) recursive
scanning that gets deeper and deeper until the end of the file. "...along with
*all* subsequent preprocessing tokens of the source file..."
Post by s***@amu.edu.pl
Since "rescan"
and "scan" are not the same, by saying that "scan resumes",
it would mean to leave the whole replaced list and continue
reading the file, which would make that chapter invalid - utter
nonsense in context of the whole section.
1) "Scan" and "rescan" are the same, except that the word "rescan" means that
something has already been scanned (which, for the replacement list, it has been
several times).

2) You are looking at the process in entirely the wrong way. The macro
invocation has *already* been replaced by the replacement list of the macro that
was invoked. This happens before rescanning ever occurs. When rescanning does
occur, it continues straight into whatever preprocessing tokens existed after
the invocation. The macro definition is *inlined* before it is "executed". At
this point, the only difference between the preprocessing tokens of the
replacement list (after argument substitution, etc.) and the tokens that follow
is that a disabling context is established from point A to point B. During the
(re)scan from point A to point B, if a the macro name is found, it is painted.
(I'll address the need for painting again in the "(re)examined" thread.)
Post by s***@amu.edu.pl
The macro expansion within argument substitution is strictly
recursive (arguments are expanded in isolation) and this
is described as "are completely macro replaced" (6.10.3.1#1);
what reason would I have to believe that "for more macro names
to replace" (6.10.3.4#1) - without any further comments - is
something different?
Because macro expansion within argument substitution is not recursive. The only
thing that is recursive with arguments is the process of scanning for macro
names to replace. Specifically, it is a recursive application of the "function"
that scans the entire set of preprocessing tokens of a file, but with certain
limitations applied (like existing contexts, if any, and the lack of defined
interpretation of directives).
Post by s***@amu.edu.pl
I think that the described _procedure_ of scan/rescan is recursive
(it need not be implemented as such), at least in sense of
keeping a stack of nested contexts (disabled macros). The
token processing is iterative (I'd say, token stream is a global
entity on which all recursive scan levels work - except args, which
constitute separate token streams).
I don't agree with the first part of course, but you're right when you say token
processing is iterative. The token stream is a global (i.e. per source file)
entity on which all scans work, except arguments which constitute separate token
streams which are "initialized" with the tokens from the global stream that made
up the argument.
Post by s***@amu.edu.pl
Post by Paul Mensonides
No. The token A does not constitute a macro invocation. Only A immediately
followed by an ( constitutes the beginning of a macro invocation (as defined
previously). That is the minimum that is required to say that it is an
invocation.
But the standard does not mention "invocation", but
merely: "[...] for more macro names to replace" (6.10.3.4#1).
Macro names are not replaced, it wouldn't make sense, but it
suggests that mere occurence of the "macro name" within
some context makes its expansion in that context.
The standard is using the terminology in an inconsistent fashion.
Otherwise, it doesn't make any sense at all except for macro names that refer to
object-like macros because macro names that refer to function-like macros aren't
replaced. Only invocations that involve them are replaced. The literal
translation is obviously erroneous in this case. What it certainly doesn't mean
(or suggest) is that a function-like macro invocation pending replacement is
pulled into whatever context the that the macro name exists in. That is far
more of a literal adjustment than simply replacing "...more macro names to
replace..." with "...more macro invocations to replace...." which is what the
text actually means.
Post by s***@amu.edu.pl
For the second thing, function-like *macro-name* "[...] followed by a (
as the next preprocessing token introduces [...]" (6.10.3#10) macro
")" that has backwards maching "(" preceded by fn-like macro name.
(The last one would probably be closest to D.Prosser's algorithm
understanding, but it wouldn't agree with iterative processing.)
So what? There is only one way to form an invocation of a function-like macro,
and that is with the macro name followed by ( followed by the actual parameters,
if any, followed by ). It is an all-or-nothing scenario. A macro name
referring to a function-like macro name does not constitute an invocation by
itself. OTOH, it can be construed as a valid logical deduction to say that a
macro name followed by ( does constitute an invocation (because at that point,
it must finish the invocation) for the purposes of "nested" in 6.10.3.4/2. I.e.
MACRO ( is the *minimum* required to call something an invocation of a
function-like macro.
Post by s***@amu.edu.pl
Nor both tokens, nor all (in this sense "invocation" becomes
more an idea, rather that strictly "physical" sequence of tokens).
It is a physical sequence of tokens. The question is only at what point (if
any) is a partially nested invocation considered nested with a disabling
context. I assert that...

#define A() MACRO()

A()

MACRO()
|_______|
|
A context

...is definitely nested...

#define B() MACRO

B()()

MACRO ()
|_____|
|
A context

...is definitely *not* nested, and...

#define C() MACRO(

C())

MACRO( )
|______|
|
A context

...is questionable. I.e. it is logically plausable to interpret it either way.
Post by s***@amu.edu.pl
During scanning, when I find fn-macro-name, I read forward (analyze)
one token to check for "("; if I find one, I read forward until
the closing ")" to establish the argument list and invocation
(sequence) span; the main pointer always *stays* at "macro-name".
Then the whole invocation is substituted, but since main pointer
lies in a context, the whole replacement takes place therein.
In the example above A(...) will be expanded nested in "context B"
(provided the macro-name is outside any "context A"; otherwise
it gets painted blue and we procede on to next token, ie "(").
The Standard is worded in terms of macro-names, and never mentions
that the whole invocation (sequence) must lie within a context
for it to apply.
The standard uses the term "macro name" (and specifically defines it as other
than an invocation) inconsistently. Even so, the meaning is clear. Further,
consider this example:

#define A() B
#define B(x) x

#define MACRO() A()(1)

A()( MACRO() )

Is the invocation of MACRO supposed to be nested according to your supposition
above?

BTW, this actually represents the only difficult part of implementing macro
expansion logic. Properly handling argument expansion that is part of an
invocation that is partially nested but where a macro invocation in the argument
is completely non-nested.
Post by s***@amu.edu.pl
The argument list *belongs* to the fn-macro-name,
for it was emphesised in the definition of the invocation,
and because it fits in how "Rescan" section describes it
("[...] macro names to replace").
This line of logic is flawed. The use of the term "macro name" in that section
means "macro invocation". It is an unintentional terminological error.
Post by s***@amu.edu.pl
Post by Paul Mensonides
Post by s***@amu.edu.pl
Post by Paul Mensonides
tok tok 1 tok tok
||| ^ |||
|||___|||
|| | ||
|| C context
||_____||
| | |
| B context
|_______|
|
A context
tok tok 1 tok tok
^
Fully agreed.
If so, how can you say it is recursive? That behavior is iterative.
Because I think this shows a hierachy of nested scans/rescans.
Only the reading and processing is iterative.
I explained everything above.
There are no "nested scans" (except, as yet another way to view it, with
arguments). The macro replacement process simply replaces an invocation of a
macro with the replacement list in the definition of the macro, performs
argument substitution (etc.), constructs a disabling context, and starts
scanning again. It is that simple.
Post by s***@amu.edu.pl
Post by Paul Mensonides
Post by s***@amu.edu.pl
When I started the editor, my intention was to agree with you, and to back
up your argument I wanted to bring that excerpt where it says "nested
invocation", but I din't find it. If the Standard said in 6.10.3.4#2
"if any nested replacements encounter the invocation of the macro
being replaced", I would definitely say you were right, because
in the above example "A invocation" is not nested; but it doesn't.
I never said or implied that at all. All that is required is for a single
Sorry..., my mind was tired then, and I realised later I was writing nonsense.
No problem. I was afraid that I wasn't communicating my points well at all and
that I'd have to do it again (for the zillionth time). :)
Post by s***@amu.edu.pl
What I wanted to say is: If the Standard said in 6.10.3.4#1
"[...] for more macro invocations to replace." (this of course
refers to "the resulting token sequence"), I would agree etc...
Sorry again for introducing unnecessary "noise".
Again, no problem. :)
Post by s***@amu.edu.pl
Post by Paul Mensonides
I personally think that the standard is clear (or, at least, clear enough).
It
Post by s***@amu.edu.pl
Well, the link you've given before (with D.Prosser's algo) suggests
I'm not the only one to have this "alternative understanding".
Since other respectable people interpret the Standard different
ways, then maybe it is still not "clear enough".
It certainly isn't as clear as it could be, nor is it laid out with a
particularly logical structure like it could be. However, the alternative
understanding defined by the algorithm is merely the intent that specifies the
result of macro expansion. It is not actually what is defined in the standard.
The algorithm and the standard define two very different conceptual models that
yield the same results in virtually all cases. The core principle of the intent
is to allow as much expansion as possible without the danger of infinite
expansion.
Post by s***@amu.edu.pl
Post by Paul Mensonides
could certainly be spelled out better. Note that in the Annex J of C99 (which
is non-normative) there is a reference to this issue that says that it is
Thanks. Actually I'm using n2794.txt downloaded recently and
I found this in Annex K.
It is certainly is "hidden". :)
Post by s***@amu.edu.pl
Look, by now I'm certain that you're most probably "right", in sense
how David Prosser described "almost all implementations".
For me this is a dry (and extremely tiring) discussion,
because I know I represent the "wrong" side, because the intentions
were different.
Okay. This discussion is tiring for me as well and eats up my time, but I
continue because there is an existant lack of understanding of the behavior of
the preprocessor and because I represent many that require the "partially nested
is not nested" behavior. The conceptual model is iterative, but, as Doug says,
whether it is or is not is only relevant because of the interpretation of the
term "nested" as it is used in 6.10.3.4/2. Given an iterative model, the
nesting is contextual, not procedural, and its interpretation must be viewed in
that light because the two are critically different because a complete
invocation that spans the end of a context is not "nested" in that context in
the normal sense of the word. It is "overlapped" with that context. This is
not the first (and not likely to be the last) time that I've had it this debate.
Post by s***@amu.edu.pl
All I'm doing is I'm acting as a lawyer and trying to establish
what the Standard actually says, regardless of any intentions.
Fine. Everything that I have said is exactly what the standard says. The
questionable part is *only* whether something that is contextually partially
nested is considered contextually nested in that paragraph. A partial nesting
doesn't not, in my opinion, constitute nesting as the term is defined in the
English language. Nevertheless, I can understand how it *might* be interpreted
that way. For that reason, the text is obviously not clear enough and so
original intent becomes important.
Post by s***@amu.edu.pl
I hope I can be "trusted", because I'm a fresh reader.
I used to view macro expansion with the same procedural model as Doug describes.
Since then, however, I have studied it in meticulous detail because preprocessor
metaprogramming is requires the study of those details. As a result, I now
understand the true way that the preprocessor works (in detail), despite the
fact that it is alien to most other forms of abstraction in programming
languages.

Regards,
Paul Mensonides
Douglas A. Gwyn
2004-02-28 17:10:49 UTC
Post by Paul Mensonides
Post by Douglas A. Gwyn
The wording in the C standard makes it pretty clear that
the inner replacement occurs *during* the process of the
original replacement (thus setting the condition for blue
paint),
The wording of the C standard makes it clear that the rest of the preprocessing
tokens in the file are *not* included. "If the name of the macro being replaced
is found during this scan of the replacement list (not including the rest of the
file's preprocessing tokens), it is not replaced. Furthermore, if any nested
replacements encounter the name of the macro being replaced it is not replaced."
You left out the preceding paragraph, which sets the
context:
"... Then, the resulting preprocessing token
sequence is rescanned, along with all subsequent
preprocessing tokens of the source file, for
more macro names to replace."

Then the first sentence of the part you quoted is not
applicable to the inner "B" in our example, because it
does not exist yet during "this" scan (the scan of the
immediate consequence of tne parameter substitution
and #,## processing, looking for macro names).
However, the second sentence ("Furthermore, ..." does
apply to the name "B" that results from the nested "A"
replacement (which recursively involves its own scan
for further macro names).
Post by Paul Mensonides
No, I didn't. The context no longer exists because the invocation is not nested
(see below). You are blatantly ignoring what the standard dictates because of a
typical view that macro expansion is like a procedural hierarchy. Yes, scanning
is not finished with the preprocessing tokens of B's replacement list, but is
finished with the *replacement* of the invocation of B.
No, it is not. The subclauses of 6.10.3 all are part
of the process of macro replacement described by clause
6.10.3. That includes 6.10.3.4 (rescanning and further
replacement).

There are ways for implementations to take shortcuts,
as it it well known that tail recursion can be replaced
by iteration. But the specification does not refer to
any such shortcuts, and if an implementation loses a
context due to being in too big a hurry to move on, it
is simply a defective implementation.

If we were to take you view to its logical extreme,
then there would be no possibility of an in-process
macro name appearing within a nested expansion (past
the first level). That is contradicted by the standard
specifying special consequences for cases where that
*does* occur.
Post by Paul Mensonides
I'll try to be more clear. The preprocessing tokens before and (more
importantly) after any given invocation are always right there with the
invocation. They don't have to be "gotten" from some outer token stream when
needed--they are _already_ there in one steady stream of preprocessing tokens
that makes up all the preceding and trailing tokens to the end of the file.
However, the standard specifies when they are examined and/pr
replaced and when they are not examined and/or replaced.
When I say colloquially that tokens are "fetched", I mean
merely that they are available for examination and possible
replacement.
Post by Paul Mensonides
.. That is not what the standard specifies. It specifies replacement first,
then rescanning, not rescanning followed by replacement.
I am not at all confused about the sequencing. The
process of macro replacement, starting just after
an identifier has been recognized as a defined macro
name during the appropriate phase of processing,
involves temporarily flagging that global identifer
as "in the process of being replaced", locating the
tokens for the macro arguments and fully macro-replacing
each of *them* [recursive subprocessing], substituting
each fully-expanded argument for the corresponding
parameter in the definition for the current macro,
concurrent with #,## processing, then rescanning the
result of the previous operation, looking for
identifiers corresponding to defined macros, and for
each such name applying the following processing:
if (global identifier is "in the process of
being expanded") then apply permanent blue
paint to that identfier pp token (not to
the global identifier, as I might have
mistakenly indicated in a previous posting).
if (identifier token has ever had blue paint
applied) then leave it intact;
else (identifier is a defined macro name,
and has not been painted blue) so
begin a macro replacement process for
that identifier, and if it is a
function-like macro, it is allowed to
access remaining pp tokens (i.e. those
not involved in any nesting replacement
process from expansion of macros that
ay still be in process) when fetching
its arguments.
*After* each such name (we're now back to the original
replacement buffer) has been fully macro-replaced, the
current macro replacement process (6.10.3) is complete
and the "in the process of being replaced" flag is
removed from the global identifier. The context now
pops back to whatever scanning was being done (top
level or some nested macro replacement).
Post by Paul Mensonides
... it as if every invocation is physically inlined into the stream
of preprocessing tokens where the invocation existed (which is either the
top-level stream or an argument being processed as an separate stream).
That is merely what "substitution" consists of, and
has no deeper meaning.
Post by Paul Mensonides
No it doesn't. The invocation of B at this point is nested within the
rescanning of A's replacement list, but not B's.
That makes no sense if "nesting" has its logical
(procedural) meaning rather than a geographical
interpretation. Repeating what I said above, that
would mean that there would be no possibility of an
in-process macro name appearing within a nested expansion
past the first level, which is contradicted by the
standard discussing exactly that case.
Post by Paul Mensonides
Post by Douglas A. Gwyn
Note that the process described in 6.10.3.4 is one
*component* of the process 6.10.3, not something that
happens after 6.10.3 is complete.
(re)scanning -> replacement -> substitution -> #/## ->
^ |
|____________________________________________________|
That is contrary to all similar constructions in the
standard, such as what constitutes the referred-to
operand in a subexpression, and is clearly contradicted
by the explicitly recursive wording of the specification,
and the explicit reference to more than one level of
nesting. (Your model involves only one level.) Note
also that macro replacement within each argument to a
macro is necessarily performed as if 6.10.3 is invoked
as a subroutine.
Post by Paul Mensonides
6.10.3/9 - "...that causes each subsequent instance of the macro name to be
***replaced by the replacement list of preprocessing tokens that constitute the
remainder of the directive***."
6.10.3/10 - "...that is ***replaced by the replacement list*** in the
definition..."
Yes, that explains the eventual purpose of the "body"
of a macro *definition*. The actual replacement
process does not begin at that point, but upon
subsequent recognition of a defined identifier during
the top-level pass across the input to translation
phase 4, and also upon recognition of defined
identifiers during collection of macro arguments, and
also during recognition of defined identifiers during
the rescan phase of *each* macro expansion process,
except when the identifier pp token has had blue paint
applied.
Post by Paul Mensonides
A macro invocation is replaced by the preprocessing tokens of the replacement
list, *not* by the rescanned (i.e. macro-expanded) preprocessing tokens of the
replacement list. Conceptually, this step happens even before argument
substitution.
No, not at all. The complete replacement can involve
much more than what was contained explicitly within
the body of the definition. That is why we spell out
the process in detail.
Post by Paul Mensonides
... There is no procedural nesting, there is only
physical/geographical nesting designed to prevent macro expansion from looping
forever.
Actually, "looping forever" is a logical/procedural
notion, not a geographic one.
Post by Paul Mensonides
Post by Douglas A. Gwyn
... Further, the specification makes it very clear
that blue paint is permanent, so there is no way for the
newly created "B" to ever trigger macro replacement.
Absolutely, blue paint is permanent--
But since the inner "B" was not *geographically* nested,
by your interpretation you'd have to say that it wasn't
painted blue.

I still think this whole matter boils down to whether
"nested" means procedurally or geographically. I know
that we recently discussed this in a WG14 meeting, but
I seem to recall that it was during an informal session
and that hope was expressed that nobody would file a DR
asking for a clarification. If as you have said there
is a "ton" of existing software that *relies* on that
detail, then it might even be unwise to ask for an
official ruling, which is likely to be as I have
explained, forcing implementations that have done it
wrong to make a choice between conformance or making
existing bogus code continue to do what the programmer
expected. (More likely, another option would be added
to the compiler, and mentioned in the conformance
section of the documentation.)

It would be good to advise programmers not to rely on
your model (the geographic nesting one) nor on my model
(the procedural nesting one) since implementations may
differ on this score.
s***@amu.edu.pl
2004-02-28 22:57:06 UTC
Post by Douglas A. Gwyn
I still think this whole matter boils down to whether
"nested" means procedurally or geographically. I know
So as I understand, this issue is not definitely solved.
Although I think that Stadard talks about "procedural nesting",
most preprocessors/compilers seem to think different.
There's nothing to rely on then.

+++++++++++

Now different, but related problem:

Is argument expansion (not argument substitution) part of fn-like
macro expansion, or does it precede the actual expansion?
6.10.3.1#1 is rather vague on it:
"Before being substituted, each argument's preprocessing
tokens are completely macro replaced [...]"

If, as you say, "nesting" is procedural (to which I agree), and
argument substitution is part of macro replacement (which IMO is
a sane assumption), then it would mean, that the arguments are
expanded in same context as the macro.

#define A() B
#define B(x) C(x)

A()(A())
//found A(), replace
B (A())
|A|
//rescan finds B(...), in context of A,
//does replacement; argument A() is expanded first;
//since we assume that this is part of B replacement,
//A is disabled, thus A in the arg gets painted: A'()
C(A'())
| B |
| A | //futher rescans don't change anything

C(A()) //final result

Am I correct? If I am, then this would mean that for fn-like macros
the context is extended in "geographical" sense on all its arguments.

----

IMHO there is a minor glitch in the Standard in wording:
"[...] each argument's preprocessing tokens are completely macro
replaced as if they formed the rest of the preprocessing file;"
"as if they formed the rest of the file" is only in sense of
available tokens; arguments may be replaced in expansion of
a macro during a rescan, which means that some macros may be
disabled and "painting" rules apply - which is not quite
the same "as if they formed ...".
--
Stan Tobias
Douglas A. Gwyn
2004-02-29 06:17:41 UTC
Post by s***@amu.edu.pl
Is argument expansion (not argument substitution) part of fn-like
macro expansion, or does it precede the actual expansion?
"Before being substituted, each argument's preprocessing
tokens are completely macro replaced [...]"
Where the [...] limits the pp token stream context to just
that bounded by the original argument.
Post by s***@amu.edu.pl
If, as you say, "nesting" is procedural (to which I agree), and
argument substitution is part of macro replacement (which IMO is
a sane assumption), then it would mean, that the arguments are
expanded in same context as the macro.
Not really; it is as if
phase 5 input := result of phase 4(translation unit);
phase 4(token stream):
...
if identifier has macro def,
macro replace;
...
macro replace:
locate arguments;
for each argument,
argument := phase 4(argument);
... rest of current macro replacement
(with a few details omitted so as not to obscure the important
point)
Post by s***@amu.edu.pl
#define A() B
#define B(x) C(x)
A()(A())
//found A(), replace
B (A())
|A|
//rescan finds B(...), in context of A,
//does replacement; argument A() is expanded first;
//since we assume that this is part of B replacement,
//A is disabled, thus A in the arg gets painted: A'()
Yes, it is a nested replacement and the name A is seen
while the higher-level A is in the process of being
replaced, so that inner occurrence of A is painted blue.
Post by s***@amu.edu.pl
C(A()) //final result
Seems right.
Post by s***@amu.edu.pl
Am I correct? If I am, then this would mean that for fn-like macros
the context is extended in "geographical" sense on all its arguments.
The only geographical restriction is that during argument
expansion, no additional pp-tokens can be fetched from
farther along the file. That was never an issue for this
example.
Post by s***@amu.edu.pl
"[...] each argument's preprocessing tokens are completely macro
replaced as if they formed the rest of the preprocessing file;"
"as if they formed the rest of the file" is only in sense of
available tokens; arguments may be replaced in expansion of
a macro during a rescan, which means that some macros may be
disabled and "painting" rules apply - which is not quite
the same "as if they formed ...".
You dropped the part right after the semicolon: "no other
preprocessing tokens are available.", which was provided
to reinforce what the meaning of the sentence was. No
single (portion of a) sentence in the standard can be
properly interpreted out of context.
s***@amu.edu.pl
2004-02-29 14:37:02 UTC
Post by Douglas A. Gwyn
Post by s***@amu.edu.pl
"Before being substituted, each argument's preprocessing
tokens are completely macro replaced [...]"
Where the [...] limits the pp token stream context to just
that bounded by the original argument.
ACK. This is where macro expansion is strictly recursive
even in "geographical" sense.
Post by Douglas A. Gwyn
Post by s***@amu.edu.pl
If, as you say, "nesting" is procedural (to which I agree), and
argument substitution is part of macro replacement (which IMO is
a sane assumption), then it would mean, that the arguments are
expanded in same context as the macro.
Not really; it is as if
I don't see the difference between me and you; by "context" I
understand set of disabled macro names ("being replaced").
Post by Douglas A. Gwyn
phase 5 input := result of phase 4(translation unit);
...
if identifier has macro def,
macro replace;
Here we "recursively" return to "macro replace", but since we
are in course of replacing current macro, we add it to the
context; I would note this like:

macro replace [current-context + current-macro-name]
Post by Douglas A. Gwyn
...
locate arguments;
for each argument,
argument := phase 4(argument);
Yes, but here `argument' is processed within the context
possibly passed by recursive call; see above.
The first (non-recursive) call gets empty context (file context),
ie. no macro names are disabled.
Post by Douglas A. Gwyn
... rest of current macro replacement
(with a few details omitted so as not to obscure the important
point)
Post by s***@amu.edu.pl
#define A() B
#define B(x) C(x)
A()(A())
//found A(), replace
B (A())
|A|
//rescan finds B(...), in context of A,
//does replacement; argument A() is expanded first;
//since we assume that this is part of B replacement,
//A is disabled, thus A in the arg gets painted: A'()
Yes, it is a nested replacement and the name A is seen
while the higher-level A is in the process of being
replaced, so that inner occurrence of A is painted blue.
Post by s***@amu.edu.pl
C(A()) //final result
Seems right.
Glad to agree. It was a suprise to me though. I aquired this
understanding when you mentioned "procedural nesting".

Originally I was under influence of Paul's interpretation
(I read the long discussion of 2003 between Paul Mensonides,
Jun Woong and Ivan Kosarev, google: "nested macro expansion"),
and my initial interpretation was in terms of "geographical"
context and nesting:
B (A())
|A|
//rescan finds "B(", "B" is in "context A" (this is where Paul
//would disagree), expand argument, "A" falls outside of
//"context A", thus expand, substitute and replace "B(..)".
C(B) //final result

This is actually the point where I saw value in "geographical"
interpretation of context, because I don't see any reason
why second "A" should be "forcibly" drawn into the same context
as "B" is in. For me it is not intuitive. Why would it be good
for me?

++++

To sum up: one text, three different interpretations.
The issue is for fn-like macros, whether macro-name pulls its
argument list into its context (your interpretation); or whether
aguments pull macro-name out of its context (Paul);
or whether its something in-between (initially me).

I have reviewed Dave Prosser's explanations of "the intent of
the specification", it'll yet take me a few days to understand
the algorithm he gives properly, but both the example and
the algorithm seem to strongly support Paul's interpretation.
The algorithm has a notion of sticky contexts (or "hide sets"),
which would translate directly to "geographical" context *and* nesting.
See the last but one `return' and (HS&HS')|{T} argument to subst() -
seems like the closing ")" determines the context of the macro.
I wish I knew exactly what subst() was doing.

But that obviously (at least for me) is not what the Standard says.

Can you (or Paul) give me some pointers to the *complete* algorithm
(at least what subst() does), as D.Prosser gave it, please.

++++
Post by Douglas A. Gwyn
Post by s***@amu.edu.pl
"[...] each argument's preprocessing tokens are completely macro
replaced as if they formed the rest of the preprocessing file;"
"as if they formed the rest of the file" is only in sense of
available tokens; arguments may be replaced in expansion of
a macro during a rescan, which means that some macros may be
disabled and "painting" rules apply - which is not quite
the same "as if they formed ...".
You dropped the part right after the semicolon: "no other
preprocessing tokens are available.", which was provided
to reinforce what the meaning of the sentence was. No
single (portion of a) sentence in the standard can be
properly interpreted out of context.
Yes, but what I meant is that "as if they formed [... until the end]"
suggests that this expansion (strictly "geographically" recursive here)
is done outside of any context (as would probably "the rest of the file"
be). The Standard might make a forward reference, that this expansion
might result from nested (recursive) call and be subject to additional
context rules (macro disabling and painting).
But this is minor and not worth continuing anymore.

++++

Yet another issue, this time less connected to things discussed before.

6.10.3.4#2 says at the end:
"These nonreplaced macro name preprocessing tokens are no
longer available for further replacement even if they are later
(re)examined in contexts in which that macro name preprocessing
token would otherwise have been replaced."

What the "(re)examined" remark means here?
My understanding was:

#define M exp
#define A(x) A
A(a)(M)
//expand A(a)
A (M)
|A| //rescan: found "A", have macro def A(..), *examine*
//token stream further, found "A(M)", check context,
//"A" found in "context A", painted blue, "A(M)" not
//expanded
A(M) //final result

But I feel I must be wrong, although that remark seems to
sanction this behaviour.
Prosser's algorithm is definitely against, as after first expansion
"A" would be pushed behind and a new expansion start from "(".
--
Stan Tobias
Paul Mensonides
2004-03-01 05:13:20 UTC
Post by s***@amu.edu.pl
Originally I was under influence of Paul's interpretation
(I read the long discussion of 2003 between Paul Mensonides,
Jun Woong and Ivan Kosarev, google: "nested macro expansion"),
and my initial interpretation was in terms of "geographical"
B (A())
|A|
//rescan finds "B(", "B" is in "context A" (this is where Paul
//would disagree), expand argument, "A" falls outside of
//"context A", thus expand, substitute and replace "B(..)".
C(B) //final result
This is actually the point where I saw value in "geographical"
interpretation of context, because I don't see any reason
why second "A" should be "forcibly" drawn into the same context
as "B" is in. For me it is not intuitive. Why would it be good
for me?
I'm not sure what you mean, but considering a partially nested invocation of a
function-like macro nested is definitely *not* a good thing. It
hyper-dramatically increases the number of macros required to implement
preprocessor metaprogramming constructs (i.e. constructs that are used to
generate (among many other things) repetitious program code). It creates a
"vertical" dependency that is much worse than the "vertical" dependency between
a macro and itself. Consider:

#define CAT(a, ...) PRIMITIVE_CAT(a, __VA_ARGS__)
#define PRIMITIVE_CAT(a, ...) a ## __VA_ARGS__

#define IIF(bit) PRIMITIVE_CAT(IIF_, bit)
#define IIF_0(t, ...) __VA_ARGS__
#define IIF_1(t, ...) t

#define EAT(...)

IIF(1)(CAT, EAT)(1, 2) // 12
IIF(0)(CAT, EAT)(1, 2) // [nothing]

This is a very small example that illustrates what I refer to when about the
increase in the number of macros. If partial nesting was considered nested, IIF
could not be implemented this way, it would have to have its own private version
of PRIMITIVE_CAT. That, in and of itself is not too bad, but it gets worse.
IIF itself becomes non-useful because anything that uses IIF would have to have
its own private version of IIF which in turn would need its own version of
PRIMITIVE_CAT. This gets worse and worse as things get more and more complex.
In effect, it makes it virtually impossible to make reusable components.
Consider a simplistic implementation of a WHILE loop:

#define WHILE(pred, op, ...) \
WHILE_1(pred, op, __VA_ARGS__) \
/**/
#define WHILE_S(pred, op, ...) __VA_ARGS__
#define WHILE_C(n, bit, ...) \
IIF(bit)( \
WHILE_ ## n, __VA_ARGS__ EAT \
) \
/**/
#define WHILE_1(pred, op, ...) \
IIF(bit)(pred(2, __VA_ARGS__))( \
WHILE_2, __VA_ARGS__ EAT \
)(pred, op, op(2, __VA_ARGS__)) \
/**/
#define WHILE_2(pred, op, ...) \
IIF(bit)(pred(3, __VA_ARGS__))( \
WHILE_3, __VA_ARGS__ EAT \
)(pred, op, op(3, __VA_ARGS__)) \
/**/
// etc.

The number of macro's required to allow for N iterations of the WHILE loop is N
plus some constant. But consider what you get if each iteration requires its
own IIF which, in turn, requires its own PRIMITIVE_CAT. That is already 3*N
macros for the WHILE loop steps plus 3*N PRIMITIVE_CAT macros plus 3*3*N macros
used to implement N IIF macros plus some constant. Now consider what happens if
you allow the result of the predicate to be any number instead of just 0 or 1:

#define SPLIT(i, ...) PRIMITIVE_CAT(SPLIT_, i)(__VA_ARGS__)
#define SPLIT_0(a, ...) a
#define SPLIT_1(a, ...) __VA_ARGS__

#define IS_NULLARY(...) \
SPLIT(0, CAT(IS_NULLARY_R_, IS_NULLARY_C __VA_ARGS__)) \
/**/
#define IS_NULLARY_C() 1
#define IS_NULLARY_R_1 1,
#define IS_NULLARY_R_IS_NULLARY_C 0,

#define COMPL(bit) PRIMITIVE_CAT(COMPL_, bit)
#define COMPL_0 1
#define COMPL_1 0

#define BOOL(x) \
COMPL(IS_NULLARY( \
PRIMITIVE_CAT(BOOL_, x) \
)) \
/**/
#define BOOL_0 ()

#define WHILE_1(pred, op, ...) \
IIF(bit)(BOOL(pred(2, __VA_ARGS__)))( \
WHILE_2, __VA_ARGS__ EAT \
)(pred, op, op(2, __VA_ARGS__)) \
/**/
// etc.

Imagine what kind of macro bloat occurs when this situation occurs, and so on
and so forth. Things get worse and worse very quickly.
Post by s***@amu.edu.pl
++++
To sum up: one text, three different interpretations.
The issue is for fn-like macros, whether macro-name pulls its
argument list into its context (your interpretation); or whether
aguments pull macro-name out of its context (Paul);
or whether its something in-between (initially me).
I have reviewed Dave Prosser's explanations of "the intent of
the specification", it'll yet take me a few days to understand
the algorithm he gives properly, but both the example and
the algorithm seem to strongly support Paul's interpretation.
The algorithm has a notion of sticky contexts (or "hide sets"),
which would translate directly to "geographical" context *and* nesting.
This is the one area where the standard specification deviates from the
algorithm. (I'll explain below.)
Post by s***@amu.edu.pl
Can you (or Paul) give me some pointers to the *complete* algorithm
(at least what subst() does), as D.Prosser gave it, please.
I'm not sure where you can get the rest of it. Doug?
Post by s***@amu.edu.pl
Yet another issue, this time less connected to things discussed before.
"These nonreplaced macro name preprocessing tokens are no
longer available for further replacement even if they are later
(re)examined in contexts in which that macro name preprocessing
token would otherwise have been replaced."
What the "(re)examined" remark means here?
Okay. :) The model described in the algorithm was the intent of the committee
at the time. However, the model of intent was more complex than it needed to
be, but it was exacting. Therefore, the text of the standard does not follow
the algorithm in favor of a simpler method that produces the same results as the
algorithm (but is described differently). The one except to that is this case.
Several things happened in the translation. First, the algorithm was flattened
into an iterative model (except for recursive argument subprocessing) which
simplifies the algorithm considerably. Second, the notion of "blue paint" was
introduced in order to prevent the necessity of having every identifier token
carry around a conceivable expensive hideset. In the algorithm, each identifier
preprocessing token (actually all tokens, but identifiers are the only ones that
matter) carries with it a hideset. The specification in the standard removes
that hideset from each identifier and instead specifies such context as a range
that extends from the first to the last preprocessing token in a replacement
list. Then, if that context referring to a specific macro name exists when the
corresponding macro name is scanned, it simply wouldn't be replaced. However,
in order to make this model approximate the algorithm, each of those
non-replaced macro names has to be permanently flagged (i.e. painted blue)--just
as they would have been implicitly if they carried a hideset. Now comes the
subtle difference between what the standard specifies and the algorithm. When a
macro has been replaced and scanning resumes at the beginning of the replacement
list, a context exists that disables the macro name. However, this context only
exists during the first (re)scan of the tokens that made up the replacement
list. So, after it has been rescanned (proceeding directly into the following
tokens), the context no longer exists, but any tokens that were painted are
still painted. The (re)examination that is referred to above is the result of
situation when something is used as a parameter:

#define A(x) x

A(pp-tokens)

Here, pp-tokens gets scanned once as the macro is invoked and once again when A
expands to x and gets rescanned. That rescanning is reexamination of whatever
pp-tokens became after the scan as an argument, which can cause more replacement
(but this time with a different set of disabling-contexts). This is where the
blue paint comes in that prevents an identifier from expanding in this new
context:

#define A(x) x

#define B() B

A(B()) // B' not B

In this example, B() is completely replaced and rescanned before it is
substituted for x in A's replacement list. However, it expands directly to
itself which causes the B identifier to be painted. At this point, rescanning
of B's replacement list is complete (because "no other tokens are involved")
yielding only B' (but not the B-disabling context which originally caused the
blue paint). I.e. it does not carry the context (i.e. hideset); it instead
carries the blue paint. Now A's replacement list gets rescanned, but the only
active context is the A-disabling context--B would be available here, but not
the specific B token that was previously painted. So far, the results in the
example are the same as in the algorithm, but the difference can be exploited by
something like this (and this is one of the areas where preprocessor conformance
starts to heavily decline--in the manipulation of details like this):

#define EMPTY()
#define DEFER(id) id EMPTY()

#define A() DEFER(B)()
#define B() DEFER(A)()

#define SCAN(x) x

A() // B()
SCAN(A()) // A()
SCAN(SCAN(A())) // B()
SCAN(SCAN(SCAN(A()))) // A()

This result is what it should be by the specification, but is different than the
results yielded by the algorithm. As long as more scans are applied
(parametrically), the tokens will continue to expand back and forth between the
two results because an A token never appears during the rescan of A's
replacement list and similarly with B. Note that this is not infinite
recursion/looping/whatever, more scans must be manually applied for this to take
place. The lack of context retention (unlike the algorithm), illustrated here
by manipulation, is why permanent blue paint exists--to closely but not entirely
approximate hidesets. Make sense?
Post by s***@amu.edu.pl
#define M exp
#define A(x) A
A(a)(M)
//expand A(a)
A (M)
|A| //rescan: found "A", have macro def A(..), *examine*
//token stream further, found "A(M)", check context,
//"A" found in "context A", painted blue, "A(M)" not
//expanded
A(M) //final result
Yes, the result should be A(exp) because the (M) is not considered an argument
list because the secondary A is not considered to be a macro. No attempted
invocation is required to cause blue paint on an identifier. The identifier
must merely exist inside the context. Note that this is the simple case that we
all should agree on. The A token is seen within the disabling context of A, and
that is all that is required to paint it.

Regards,
Paul Mensonides
s***@amu.edu.pl
2004-03-02 01:53:58 UTC
(changed order of quotes a little)
Post by Paul Mensonides
Okay. :) The model described in the algorithm was the intent of the committee
at the time. However, the model of intent was more complex than it needed to
be, but it was exacting. Therefore, the text of the standard does not follow
[long snip]
#define EMPTY()
#define DEFER(id) id EMPTY()
#define A() DEFER(B)()
#define B() DEFER(A)()
#define SCAN(x) x
A() // B()
SCAN(A()) // A()
SCAN(SCAN(A())) // B()
SCAN(SCAN(SCAN(A()))) // A()
That's a very interesting (and instructive) example.
Post by Paul Mensonides
This result is what it should be by the specification, but is different than the
results yielded by the algorithm. As long as more scans are applied
(parametrically), the tokens will continue to expand back and forth between the
two results because an A token never appears during the rescan of A's
replacement list and similarly with B. Note that this is not infinite
recursion/looping/whatever, more scans must be manually applied for this to take
place. The lack of context retention (unlike the algorithm), illustrated here
by manipulation, is why permanent blue paint exists--to closely but not entirely
approximate hidesets. Make sense?
Understood. I'm *very* grateful for all your explenations.

Is there a direct advantage of blue paint over hidesets? I seemed
very logical to me, that when a new token is "generated" it retains
context (memory) of its "parents", ie. hideset. It looked to me
simpler from the programmers point of view (ie. the one who uses
macros, not the one that writes the preprocessor).
Post by Paul Mensonides
Post by s***@amu.edu.pl
Yet another issue, this time less connected to things discussed before.
"These nonreplaced macro name preprocessing tokens are no
longer available for further replacement even if they are later
(re)examined in contexts in which that macro name preprocessing
token would otherwise have been replaced."
What the "(re)examined" remark means here?
#define M exp
#define A(x) A
A(a)(M)
//expand A(a)
A (M)
|A| //rescan: found "A", have macro def A(..), *examine*
//token stream further, found "A(M)", check context,
//"A" found in "context A", painted blue, "A(M)" not
//expanded
A(M) //final result
For final result I dropped the blue paint, so for the sake of being
precise I'll repeat the result with the blue paint left:
A'(M) //final result

Just now a second interpretation came to my mind, that within
the non-replaced macro, the arguments are analysed and replaced
as aruments (argument substitution, recursively), not as the
rest of the token stream:

A (M)
|A| //rescan: found "A", paint blue

A'(M) //rescan continues, A is nonreplaced macro, only analyse
^ //arguments, do recursive expansion

A'(exp) //continue scan with the rest of the stream
^
This would perhaps explain the meaning of the "(re)examined ..." remark,
but this is just my wild guess, and I don't quite believe it myself.

Third possibility is that we just skip A' and procede to "(" and
then "M", and expand "M" as an ordinary token, in the same mode
as the rest of the file.
Post by Paul Mensonides
Yes, the result should be A(exp) because the (M) is not considered an argument
list because the secondary A is not considered to be a macro. No attempted
So is it a "Yes", or a "No", because my result and your
answer look somewhat contradictory?

And do I understand you well, that blue macro-names _never_
form an invocation??? Where does the Standard claim so?
If so, then "M" above is expanded as usual token stream,
ie. the third possiblility above, is that right?
Post by Paul Mensonides
invocation is required to cause blue paint on an identifier. The identifier
must merely exist inside the context. Note that this is the simple case that we
all should agree on. The A token is seen within the disabling context of A, and
that is all that is required to paint it.
Yes, no doubt about that, but that doesn't answer my question.
I asked about the meaing of the remark "even if they are later
(re)examined in contexts ...". I just can't find any
application for this clause. What exactly is "(re)examined" and
what are its consequences?

P.S.
I haven't yet gone through the first part of your post,
but I'll do it and possibly answer it later.
--
Stan Tobias
Paul Mensonides
2004-03-04 14:31:00 UTC
Post by s***@amu.edu.pl
That's a very interesting (and instructive) example.
:)
Post by s***@amu.edu.pl
Post by Paul Mensonides
This result is what it should be by the specification, but is different than the
results yielded by the algorithm. As long as more scans are applied
(parametrically), the tokens will continue to expand back and forth between the
two results because an A token never appears during the rescan of A's
replacement list and similarly with B. Note that this is not infinite
recursion/looping/whatever, more scans must be manually applied for this to take
place. The lack of context retention (unlike the algorithm), illustrated here
by manipulation, is why permanent blue paint exists--to closely but not entirely
approximate hidesets. Make sense?
Understood. I'm *very* grateful for all your explenations.
I'm not sure that you do understand, because that was (a very thorough) answer
to your question about reexamination. In simplest terms, painted would be
completely unnecessary if it weren't for reexamination. But, reexamination does
happen--every time that an argument is passed to a macro that uses that argument
in its replacement list without it being an operand of # or ##. E.g. a simple
identity macro causes a double examination, possibly with different contexts
involved. I'll illustrate again:

#define ID(x) x

#define B(n) B ## n
#define B0 !
#define B1 B(0)

ID(B(1)) // B'(0)

I'll walk through this step by step so you can see what's going on.

ID(B(1))
| ^ |
|________|
|
ID invocation

The macro name ID is found followed by ( so the preprocessor knowns that an
invocation is beginning. As such, it recognizes the argument B(1) and performs
a recursive scan of the tokens B(1)--because the formal parameter x of ID is
used in the replacement list of ID without # and ##. (At this point, no
disabling context exists yet.)

B(1)
| ^ |
|____|
|
B invocation

The macro name B is found followed by ( so the preprocessor knowns that an
invocation is begining. As such, it recognizes the argument 1, but does not
perform a recursive scan because the formal parameter n of B is used only as an
operand to ## in B's replacement list. (Even if it did, the recursive scan of 1
would be a no-op.) The invocation of B(1) is replaced by the replacement list:

B ## n
| ^|
|______|
|
B context

Argument substitution takes place:

B ## 1
| ^|
|______|
|
B context

Token-pasting takes place:

B1
|^ |
|__|
|
B context

(Re)scanning resumes, causing B1 to expand:

B1
||^ ||
||__||
| | |
| B1 invocation
|____|
|
B context

The B1 invocation is replaced by B1's replacement list:

B(0)
||^ ||
||____||
| | |
| B1 context
|______|
|
B context

(Re)scanning resumes, causing the B token to be found with the B disabling
context (and the rest of the scan is a no-op):

B'(0)
||^ ||
||_____||
| | |
| B1 context
|_______|
|
B context

The recursive scan of B(1) in ID(B(1)) is finished resulting in the
preprocessing token sequence B'(0) (specifically, the B and B1 contexts are
gone, only the token sequence is the result). The preprocessor now returns to
the expansion of ID by replacing the ID(B(1)) invocation with ID's replacement
list:

x
|^|
|_|
|
ID context

Parameter substitution takes place:

B'(0)
|^ |
|_____|
|
ID context

(Re)scanning resumes, which does not expand B because it is painted and the rest
of the scan is a no-op, finally yielding:

B'(0)

This last step is the important one. The token sequence B'(0) is being
reexamined during the rescan that follows ID's invocation. If the B token
wasn't painted, B would get expanded here because no B disabling context exists.
Consequently, the purpose for the blue paint and the reexamination that causes
it to matter.
Post by s***@amu.edu.pl
Is there a direct advantage of blue paint over hidesets?
From an implementation perspective, yes. It is significantly less
expensive--both in size and in speed.
Post by s***@amu.edu.pl
I seemed
very logical to me, that when a new token is "generated" it retains
context (memory) of its "parents", ie. hideset. It looked to me
simpler from the programmers point of view (ie. the one who uses
macros, not the one that writes the preprocessor).
Maybe so, but nevertheless that behavior is a critical component of preprocessor
metaprogramming which uses that behavior to generalize recursion. It is very
useful for that purpose, and it is easy to understand once you get used to it.
Even so, for almost all uses of macro expansion in non-metaprogramming code,
this behavior doesn't even come up at all because deferral is not intentionally
used in any normal circumstance.
Post by s***@amu.edu.pl
Post by Paul Mensonides
Post by s***@amu.edu.pl
#define M exp
#define A(x) A
A(a)(M)
//expand A(a)
A (M)
|A| //rescan: found "A", have macro def A(..), *examine*
//token stream further, found "A(M)", check context,
//"A" found in "context A", painted blue, "A(M)" not
//expanded
A(M) //final result
For final result I dropped the blue paint, so for the sake of being
A'(M) //final result
No, it should be A'(exp).
Post by s***@amu.edu.pl
Just now a second interpretation came to my mind, that within
the non-replaced macro, the arguments are analysed and replaced
as aruments (argument substitution, recursively), not as the
When the macro is non-replaced, the preprocessor just moves past the token that
it painted and keeps scanning, so M isn't even argument at all. It is scanned
precisely as if the example was only:

(M)
Post by s***@amu.edu.pl
A (M)
|A| //rescan: found "A", paint blue
A'(M) //rescan continues, A is nonreplaced macro, only analyse
^ //arguments, do recursive expansion
A'(exp) //continue scan with the rest of the stream
^
This would perhaps explain the meaning of the "(re)examined ..." remark,
but this is just my wild guess, and I don't quite believe it myself.
No, the reexamination remark is illustrated above. Nice try, though. :)
Post by s***@amu.edu.pl
Third possibility is that we just skip A' and procede to "(" and
then "M", and expand "M" as an ordinary token, in the same mode
as the rest of the file.
Yes, this the correct one. A' is not invoked, so anything that is after it as
treated as regular preprocessing tokens, not as an argument list.
Post by s***@amu.edu.pl
Post by Paul Mensonides
Yes, the result should be A(exp) because the (M) is not considered an argument
list because the secondary A is not considered to be a macro. No attempted
So is it a "Yes", or a "No", because my result and your
answer look somewhat contradictory?
Sorry, I thought that you just accidentally overlooked that step. I was on a
different page because I wasn't seeing what you were trying to get at (because
you thought that it might have something to do with the (re)examination remark).
Post by s***@amu.edu.pl
And do I understand you well, that blue macro-names _never_
form an invocation???
Yes, identifier preprocessing tokens that are painted never take the role of the
macro name in an invocation.
Post by s***@amu.edu.pl
Where does the Standard claim so?
I don't think that it does so explicitly.
Post by s***@amu.edu.pl
If so, then "M" above is expanded as usual token stream,
ie. the third possiblility above, is that right?
Yes.
Post by s***@amu.edu.pl
Post by Paul Mensonides
invocation is required to cause blue paint on an identifier. The identifier
must merely exist inside the context. Note that this is the simple case that we
all should agree on. The A token is seen within the disabling context of A, and
that is all that is required to paint it.
Yes, no doubt about that, but that doesn't answer my question.
I asked about the meaing of the remark "even if they are later
(re)examined in contexts ...". I just can't find any
application for this clause. What exactly is "(re)examined" and
what are its consequences?
I answered that both times, the first time was the thorough answer and the
second was the reduced answer. Reexamination occurs when an argument is used in
the replacement list of a function-like macro with being an operand of # or ##.
It gets examined once when the argument is scanned separately and gets
reexamined when the replacement list where the parameter gets used is rescanned.
There is no other kind of reexamination that could cause macro expansion at all,
and without the reexamination caused by argument use, there would be no need to
apply blue paint at all. Simple non-replacement would be enough.
Post by s***@amu.edu.pl
P.S.
I haven't yet gone through the first part of your post,
but I'll do it and possibly answer it later.
Okay.

Regards,
Paul Mensonides
Paul Mensonides
2004-03-01 05:13:21 UTC
Post by Douglas A. Gwyn
Post by Paul Mensonides
The wording of the C standard makes it clear that the rest of the preprocessing
tokens in the file are *not* included. "If the name of the macro being replaced
is found during this scan of the replacement list (not including the rest of the
file's preprocessing tokens), it is not replaced. Furthermore, if any nested
replacements encounter the name of the macro being replaced it is not replaced."
You left out the preceding paragraph, which sets the
"... Then, the resulting preprocessing token
sequence is rescanned, along with all subsequent
preprocessing tokens of the source file, for
more macro names to replace."
So, by your logic, ever macro expansion is a contained process that recurses
down the entire file. That is completely flawed reasoning. The above paragraph
shows how and where the *resumption* of scanning occurs. The following
paragraph defines the concept of identifier painting with the explicit caveat
that the rest of the preprocessing tokens are not part of that painting process.
Post by Douglas A. Gwyn
Then the first sentence of the part you quoted is not
applicable to the inner "B" in our example, because it
does not exist yet during "this" scan (the scan of the
immediate consequence of tne parameter substitution
and #,## processing, looking for macro names).
However, the second sentence ("Furthermore, ..." does
apply to the name "B" that results from the nested "A"
replacement (which recursively involves its own scan
for further macro names).
The scan of the "immediate consequence" is normal scanning that proceeds
directly into the tokens that follow the outer invocation. The tokens that came
from the replacement list of that invocation constitute the explicit boundaries
for identifier painting. The second sentence is still under the context
established by the first sentence. The only question is whether a partial
invocation constitutes a nested invocation. Further, as I say again, your model
is flawed. The macro expansion process is not, by definition, like a function
call that ultimately returns the rescanned tokens of the replacement list. An
invocation is replaced by the replacement list before rescanning ever occurs
(which is explicit and unarguable). As such, there is no procedural nesting,
which ultimately mimics the intent of the recursive algorithm from which the
standard text was derived.
Post by Douglas A. Gwyn
Post by Paul Mensonides
No, I didn't. The context no longer exists because the invocation is not nested
(see below). You are blatantly ignoring what the standard dictates because of a
typical view that macro expansion is like a procedural hierarchy. Yes, scanning
is not finished with the preprocessing tokens of B's replacement list, but is
finished with the *replacement* of the invocation of B.
No, it is not. The subclauses of 6.10.3 all are part
of the process of macro replacement described by clause
6.10.3. That includes 6.10.3.4 (rescanning and further
replacement).
This is what the standard literally says. This is *unarguable* because there is
no other literal way to interpret it.

[quote]

A preprocessing directive of the form

# define identifier replacement-list new-line

defines an object-like macro that causes each subsequent instance of the macro
name to be replaced by the replacement list of preprocessing tokens that
constitute the remainder of the directive.

[end quote]

It then further goes on to say:

[quote]

Each subsequent instance of the function-like macro name followed by a ( as the
next preprocessing token introduces the sequence of preprocessing tokens that is
replaced by the replacement list in the definition (an invocation of the macro).

[end quote]

This is explicit. An invocation is replaced by the replacement list in the
definition of the macro, not by the rescanned replacement list (not even by the
argument-substituted, stringized, and token-pasted replacement list).

Further, you logic that 6.10.3 describes a single macro expansion from start to
finish is incorrect. 6.10.3 describes macros. That includes the definition of
macros (which has nothing to do with the macro expansion process), the scope of
macro definitions, the invocation of macros, the semantics of macros, and the
undefinition of macros (which also has nothing to do with the macro expansion
process).
Post by Douglas A. Gwyn
There are ways for implementations to take shortcuts,
as it it well known that tail recursion can be replaced
by iteration. But the specification does not refer to
any such shortcuts, and if an implementation loses a
context due to being in too big a hurry to move on, it
is simply a defective implementation.
This is based on an incorrect assumption that a procedural model exists--which
it doesn't. You have to proof that "replaced by the replacement list in the
definition" and similar does not mean what it literally says. Instead, you
assume that it is procedural model and try to bend the standard text to that
assumption. As such, an implementation can implement macro expansion
recursively, but then it has to mimic the behavior described in the standard.
If it does not, then it is a defective implementation.
Post by Douglas A. Gwyn
If we were to take you view to its logical extreme,
then there would be no possibility of an in-process
macro name appearing within a nested expansion (past
the first level). That is contradicted by the standard
specifying special consequences for cases where that
*does* occur.
No, it isn't. It doesn't appear procedurally nested, but it does appear
geographically or syntactically nested. If we were to take your view to its
logical extreme, then each macro expansion would denote a recursion into all the
rest of the preprocessing tokens of a file, each time going deeper and deeper
into a procedural hierarchy that ultimately returns only when the end of the
file is reached. That is ridiculous.
Post by Douglas A. Gwyn
However, the standard specifies when they are examined and/pr
replaced and when they are not examined and/or replaced.
When I say colloquially that tokens are "fetched", I mean
merely that they are available for examination and possible
replacement.
Okay. The only reason that I mention it is that you've made references to a
"replacement buffer" which (though is a possible implementation strategy) does
not exist in the conceptual model defined by the standard.
Post by Douglas A. Gwyn
Post by Paul Mensonides
.. That is not what the standard specifies. It specifies replacement first,
then rescanning, not rescanning followed by replacement.
I am not at all confused about the sequencing.
No, you're not confused. You're flat out wrong.
Post by Douglas A. Gwyn
The
process of macro replacement, starting just after
an identifier has been recognized as a defined macro
name during the appropriate phase of processing,
involves temporarily flagging that global identifer
as "in the process of being replaced", locating the
tokens for the macro arguments and fully macro-replacing
each of *them* [recursive subprocessing], substituting
^^^^^^^^^^^^^^^^^^^^^^^
Yes.
Post by Douglas A. Gwyn
each fully-expanded argument for the corresponding
parameter in the definition for the current macro,
concurrent with #,## processing, then rescanning the
result of the previous operation, looking for
identifiers corresponding to defined macros, and for
if (global identifier is "in the process of
being expanded") then apply permanent blue
paint to that identfier pp token (not to
the global identifier, as I might have
mistakenly indicated in a previous posting).
if (identifier token has ever had blue paint
applied) then leave it intact;
else (identifier is a defined macro name,
and has not been painted blue) so
begin a macro replacement process for
that identifier, and if it is a
function-like macro, it is allowed to
access remaining pp tokens (i.e. those
not involved in any nesting replacement
process from expansion of macros that
ay still be in process) when fetching
its arguments.
*After* each such name (we're now back to the original
replacement buffer) has been fully macro-replaced, the
current macro replacement process (6.10.3) is complete
and the "in the process of being replaced" flag is
removed from the global identifier. The context now
pops back to whatever scanning was being done (top
level or some nested macro replacement).
That is absolutely *not* what the standard specifies. That is what you're
flawed conceptual model specifies. This is what the standard actually
specifies:

The process of macro replacement starts just after an identifier has been
recognized as a defined *object-like* macro name or just after a ( following an
identifier that has been recognized as a defined *function-like* macro name. If
it is a function-like macro invocation, the arguments are identified and each of
them are recursively subprocessed *if* the corresponding parameter appears in
the replacement list of the macro without being an operand of the stringizing or
token-pasting operators. The replacement list of the macro replaces the
invocation of the macro. Actual parameters (or fully macro expanded actual
parameters depending on the # and ## operators) are substituted for formal
parameters in the sequence of preprocessing tokens that came from the
replacement list.. Stringizing and token-pasting occur. Scanning resumes at
the first preprocessing token that came from the replacement list, and a context
is established (that will paint the macro name preprocessing tokens if found)
that endures until just after the last preprocessing token that came from the
replacement list.

This single process proceeds from the beginning of the file to the end of the
file--with the exception of the interpretation of directives (or
"non-directives") which only has a defined meaning if no contexts currently
exist and when a recursive subprocess of an argument is currently active. Note
once again that this process is not a direct translation of the "intent
algorithm". If it was, it would "engender" implementation strategies that would
be horribly inefficient, though it would produce similar results. The point
that is particularly important about the model is that no procedural nesting
occurs. Therefore, "nested" as used in the standard text can only mean nesting
within a context. The only question is whether an invocation that is partially
nested is considered nested. Even though that is unclear in a literal sense,
examination of the original intent shows that partial nesting should not be
considered nested. Granted, "original intent" is non-normative :), but it is a
reasonable thing to mimic in an implementation when the standard contains a
non-normative note in Annex J that says it is unspecified and when a C++ DR says
that it should be that way as well. Further, examination of the underlying
purpose of painting (which is to prevent the preprocessor implementation from
having to deal with infinite macro expansion which is very difficult to detect
because of the iterative model) yields the logical conclusion that only an
invocation that is completely nested could cause infinite expansion. This,
coupled with a great deal of existing code and the semantics of virtually every
existing C and C++ preprocessor in common use, results in only a single viable
way of interpreting an invocation that is partially nested.
Post by Douglas A. Gwyn
Post by Paul Mensonides
... it as if every invocation is physically inlined into the stream
of preprocessing tokens where the invocation existed (which is either the
top-level stream or an argument being processed as an separate stream).
That is merely what "substitution" consists of, and
has no deeper meaning.
"Substitution" occurs with arguments, replacement occurs for a macro invocation.
That replacement consists of the exact preprocessing tokens of the replacement
list as explicitly specified by the standard. There is no deeper meaning
involved, that is simply what the standard specifies.
Post by Douglas A. Gwyn
Post by Paul Mensonides
No it doesn't. The invocation of B at this point is nested within the
rescanning of A's replacement list, but not B's.
That makes no sense if "nesting" has its logical
(procedural) meaning rather than a geographical
interpretation.
Yes, it makes no sense if "nesting" has a procedural meaning, but it doesn't. I
disagree that "logical" meaning implies procedural. Both procedural nesting and
syntactic nesting are perfectly logical forms of nesting.
Post by Douglas A. Gwyn
Repeating what I said above, that
would mean that there would be no possibility of an
in-process macro name appearing within a nested expansion
past the first level, which is contradicted by the
standard discussing exactly that case.
The standard never refers to it those terms at all. The standard discusses only
the nesting within a context.
Post by Douglas A. Gwyn
Post by Paul Mensonides
Post by Douglas A. Gwyn
Note that the process described in 6.10.3.4 is one
*component* of the process 6.10.3, not something that
happens after 6.10.3 is complete.
(re)scanning -> replacement -> substitution -> #/## ->
^ |
|____________________________________________________|
That is contrary to all similar constructions in the
standard, such as what constitutes the referred-to
operand in a subexpression,
Yes, macro expansion is quite different than a procedural hierarchy. The
assumption that it is because it seems "normal" is flawed. Just the fact that
partial nesting can occur shows that the model is very foreign to the normal
semantics of a procedural hierarchy.
Post by Douglas A. Gwyn
and is clearly contradicted
by the explicitly recursive wording of the specification,
The standard does not have any explicitly recursive wording regarding macro
expansion whatsoever. Instead, it literally and explicitly dictates an
iterative model (except for argument processing which is correctly recursively
subprocessed as you say). In fact, it is incredibly clear in this regard. The
*only* thing that isn't clear is what happens when an invocation is partially
nested. If procedural nesting occurred, "partial nesting" is impossible.
(also, if procedural nesting occurred there would probably be an
implementation-defined nesting limit (which there isn't)) What you keep
referring to are general notions "recursive wording" and that 6.10.3 defines a
single complete process for any give macro replacement. Neither of those
notions are present in the text of the standard. 6.10.3 refers to many things
besides the replacement process, the fact that 6.10.3.1 through 6.10.3.4 are
nested within 6.10.3 does not imply that they are subprocesses of some macro
expansion that began at 6.10.3 (and it especially doesn't imply that they are
subprocesses of macro definition or sibling processes of macro undefinition).
Post by Douglas A. Gwyn
and the explicit reference to more than one level of
nesting. (Your model involves only one level.)
The model involves only one level of *procedural* nesting (except for arguments
which can cause an number of levels), but any number of levels of *contextual*
nesting.
Post by Douglas A. Gwyn
Note
also that macro replacement within each argument to a
macro is necessarily performed as if 6.10.3 is invoked
as a subroutine.
Not 6.10.3 as a whole (which makes no sense). Rather, only the parts dealing
with the macro replacement process. It is as if it was a separate file (i.e.
stream of preprocessing tokens), but possibly with some contexts already
established.
Post by Douglas A. Gwyn
Post by Paul Mensonides
6.10.3/9 - "...that causes each subsequent instance of the macro name to be
***replaced by the replacement list of preprocessing tokens that constitute the
remainder of the directive***."
6.10.3/10 - "...that is ***replaced by the replacement list*** in the
definition..."
Yes, that explains the eventual purpose of the "body"
of a macro *definition*. The actual replacement
process does not begin at that point, but upon
subsequent recognition of a defined identifier during
the top-level pass across the input to translation
phase 4, and also upon recognition of defined
identifiers during collection of macro arguments, and
also during recognition of defined identifiers during
the rescan phase of *each* macro expansion process,
except when the identifier pp token has had blue paint
applied.
No, those two locations are the only locations that describe the semantics of
replacing an instance of a macro invocation. It does not say that it does that
*anywhere* else. I agree that this is not a good way of describing the process,
but nevertheless, that is the way it is.
Post by Douglas A. Gwyn
Post by Paul Mensonides
A macro invocation is replaced by the preprocessing tokens of the replacement
list, *not* by the rescanned (i.e. macro-expanded) preprocessing tokens of the
replacement list. Conceptually, this step happens even before argument
substitution.
No, not at all. The complete replacement can involve
much more than what was contained explicitly within
the body of the definition. That is why we spell out
the process in detail.
"Completely replaced" is a term used only when referring to argument
subprocessing, not a single macro expansion. The standard uses "completely
replaced" referring to all the preprocessing tokens that make up an argument.
It does not say that all of the macro invocations within that sequence of
preprocessing tokens are completely replaced--only that the entire sequence is
completely replaced. (It does say that all the macro invocations are
"expanded", however.)

The replacement itself is specified in only two places (those that I quoted).
Those are not forward references to macro expansion, they define the initial
step of macro expansion as scanning procedes.
Post by Douglas A. Gwyn
Post by Paul Mensonides
... There is no procedural nesting, there is only
physical/geographical nesting designed to prevent macro expansion from looping
forever.
Actually, "looping forever" is a logical/procedural
notion, not a geographic one.
No, recursing forever is a procedural notion. Looping forever is an iterative
notion. And, once again, "logical" is not a synonym of "procedural" even when
computer science context is applied.
Post by Douglas A. Gwyn
Post by Paul Mensonides
Post by Douglas A. Gwyn
... Further, the specification makes it very clear
that blue paint is permanent, so there is no way for the
newly created "B" to ever trigger macro replacement.
Absolutely, blue paint is permanent--
But since the inner "B" was not *geographically* nested,
by your interpretation you'd have to say that it wasn't
painted blue.
The second (i.e. "inner") B invocation (at point #2) is completely nested within
the A-disabling context. It therefore expands within that context which
subsequently causes the A identifier preprocessing token to be permanently
painted.
Post by Douglas A. Gwyn
I still think this whole matter boils down to whether
"nested" means procedurally or geographically.
I think that it boils down to whether partially nested is considered nested in
this context. Either view amounts to the same thing though.
Post by Douglas A. Gwyn
I know
that we recently discussed this in a WG14 meeting, but
I seem to recall that it was during an informal session
and that hope was expressed that nobody would file a DR
asking for a clarification. If as you have said there
is a "ton" of existing software that *relies* on that
detail, then it might even be unwise to ask for an
official ruling, which is likely to be as I have
explained, forcing implementations that have done it
wrong to make a choice between conformance or making
existing bogus code continue to do what the programmer
expected.
What you say is "wrong" is the original intent as specified by the (now
infamous) "intent algorithm". I realize that you probably don't think that C++
should matter to C in this area, but C++ already has a DR for this exact thing
with a probable resolution that it should adhere to the original intent as
defined by the algorithm in this area though it needs to be agreed on by both
committees. Some examples of the existing code to which I refer are C++
libraries in Boost whose use is widespread, the Boost preprocessor library
(which is both a C and C++ library) whose use outside of Boost exists in many
places (I even heard that someone was porting Loki to C with it.), Andrei
Alexandrescu's and John Torjo's SMART_ASSERT article/implementation (available
at CUJ), and Chaos (which is a significantly more advanced version of Boost
preprocessor that supports C99 constructs fully (such as variadic macros and
placemarkers)). The moral of the story is that the preprocessor is used for
code generation in many places, and its use for that purpose is growing. An
interpretation other than the original intent regarding partial invocation
causes a horribly exponential increase in the number of macros required to
implement a general purpose component in such a framework. Such a specification
would be pointless, and likely cause a divergence between the C and C++
preprocessors which would be unfortunate--especially considering that the C++
EWG has actively agreed that compatibility with the C preprocessor wherever
possible should happen.
Post by Douglas A. Gwyn
(More likely, another option would be added
to the compiler, and mentioned in the conformance
section of the documentation.)
It already is in Annex J of C99.
Post by Douglas A. Gwyn
It would be good to advise programmers not to rely on
your model (the geographic nesting one) nor on my model
(the procedural nesting one) since implementations may
differ on this score.
And according to the (non-normative) note in Annex J, the partial invocation
case can go either way. However, the behavior that I support is by far the most
prevalent in existing preprocessors and matches the result of the original
intent and does not defy the underlying point of painting in the first place.

Regards,
Paul Mensonides
Douglas A. Gwyn
2004-03-01 07:24:27 UTC
Post by Paul Mensonides
So, by your logic, ever macro expansion is a contained process that recurses
down the entire file. That is completely flawed reasoning.
Sorry, that logic is supplied by you, not by me. I only
take the phrase to mean that additional tokens can be
consumed during further processing of the replacement
buffer, and only because of the possibility of an
identifier for a function-like macro being produced at the
very end of a replacement buffer (without the arguments).
Post by Paul Mensonides
The above paragraph
shows how and where the *resumption* of scanning occurs.
Finally, a clear statement of why you think it is iterative
rather than recursive. Unfortunately, the whole notion of
"scanning" is not spelled out in the standard, neither in
the description of translation phase 4 nor anywhere else in
the section about macros. In particular, it is never said
that there is a top-level scan. The only mention is in the
context of *re*scanning a set of newly minted replacement
tokens. Was it ever "scanned" before it was "rescanned"?
There simply isn't enough said about the model.

I think I now see how you are reading 6.10.3.4 paragraph 2
as saying in effect "Back up then proceed with the top-level
scan, but first plant a temporary marker behind what was
just substituted, so that the painting rules can be applied
*for the just-replaced identifier* until the marker is
reached." Taken literally that requires not only the flag
in the symbol table and a way to optionally paint identifier
tokens, but also a way to mark geographic points in the
token stream with identifiers (one level is adequate). The
implementation would actually be simpler using the recursive
model.
Post by Paul Mensonides
invocation is replaced by the replacement list before rescanning ever occurs
(which is explicit and unarguable). As such, there is no procedural nesting,
But parameter substitution, pasting, placemarker removal, etc. is
certainly part of the process of a *single* macro replacement, so
at least the first part of 6.10.3.4 is *within* the context of a
single macro replacement; it is reasonable, and consistent with
similar unarguably recursive constructions throughout the
"language" portion of the standard, to take the *whole* of 6.10.3.4
as a component of a *single* instance of the process of macro
replacement (6.10.3).
Post by Paul Mensonides
Okay. The only reason that I mention it is that you've made references to a
"replacement buffer" which (though is a possible implementation strategy) does
not exist in the conceptual model defined by the standard.
Well, I couldn't use "replacement list", because the standard
botches its use of that term, using it both for the body of
the macro definition (6.10.3.4 para. 1) and for the contents
of the replacement buffer (6.10.3.4 para. 2).
Post by Paul Mensonides
The standard does not have any explicitly recursive wording regarding macro
expansion whatsoever.
Yes, it *does*. In the course of specifying how macro
replacement works in detail, it makes two logically
separate references to macro replacement on "stuff":
the arguments (about which we seem to agree) and macro
names found during the rescan of the replacement buffer.
The wording is thus self-referential (recursive) even
if the intended model is not. Since the nonlooping
feature introduces a wart into the pure model, it is
hard to determine which pure model is being warted.
The practical difference is very slight, showing up so
far as I can see only when macro expansion produces an
identifier that has been defined as the name of a
function-like macro at the exact end of the replacement
buffer.
Post by Paul Mensonides
Post by Douglas A. Gwyn
I still think this whole matter boils down to whether
"nested" means procedurally or geographically.
I think that it boils down to whether partially nested is considered nested in
this context. Either view amounts to the same thing though.
...
interpretation other than the original intent regarding partial invocation
causes a horribly exponential increase in the number of macros required to
implement a general purpose component in such a framework.
I have not been convinced of that. (I *am* easily
convinced that some programmers have produced code
that depends on a particular method of implementation.)
I have already noted above the only usage I can see
where it matters which interpretation is implemented,
and it doesn't seem like it is a necessary feature
at all:
#define foo(whatever) whatever
#define bar stuff foo
bar(whatever)
Why is that not
#define bar(whatever) stuff foo(whatever)
?
Post by Paul Mensonides
Post by Douglas A. Gwyn
It would be good to advise programmers not to rely on
your model (the geographic nesting one) nor on my model
(the procedural nesting one) since implementations may
differ on this score.
And according to the (non-normative) note in Annex J, the partial invocation
case can go either way. However, the behavior that I support is by far the most
prevalent in existing preprocessors and matches the result of the original
intent and does not defy the underlying point of painting in the first place.
The "original intent" is a red herring; in fact the C
Standard did *not* spell out the model with the clarity
of an algorithm such as Dave's, alas, and (although I'd
have to rummage through notes long buried) there is no
particular reason to think that the text of the standard
as eventually negotiated and published was or was not
meant to prcisely mirror Dave's algorithm. In fact I
rather recall that we did make some adjustments, in the
belief that the very examples under discussion could
reasonably be done either way, and we didn't want to
precude either form of implementation.
Paul Mensonides
2004-03-04 14:30:59 UTC
Post by Douglas A. Gwyn
Post by Paul Mensonides
So, by your logic, ever macro expansion is a contained process that recurses
down the entire file. That is completely flawed reasoning.
Sorry, that logic is supplied by you, not by me. I only
take the phrase to mean that additional tokens can be
consumed during further processing of the replacement
buffer, and only because of the possibility of an
identifier for a function-like macro being produced at the
very end of a replacement buffer (without the arguments).
Then you interpret the phrase the mean something other than what it says. You
say that 6.10.3.4 is a nested subprocess of a macro invocation's replacement.
That sentence says, "The, the resulting preprocessing token sequence is
rescanned, along with all subsequent preprocessing tokens of the source file,
for more macro names to replace." It doesn't say, "...along with any
preprocessing tokens from the rest of the source file (or any outer token
buffers) that are needed to finish an invocation". The logical result is that
the entire rest of the source file's processing occurs as a subprocess of a
macro invocation. That is a recursion that continues to get deeper and deeper
into a procedural hierarchy (that never branches) with each successive macro
invocation. I realize that that is not what you mean, but you're viewpoint is
ignoring the literal text because it doesn't fit the model that you espose.
Post by Douglas A. Gwyn
Post by Paul Mensonides
The above paragraph
shows how and where the *resumption* of scanning occurs.
Finally, a clear statement of why you think it is iterative
rather than recursive. Unfortunately, the whole notion of
"scanning" is not spelled out in the standard, neither in
the description of translation phase 4 nor anywhere else in
the section about macros. In particular, it is never said
that there is a top-level scan.
No, but it does describe the process that causes macro invocations to be
replaced. I.e. subsequent instances of object-like macro names and
function-like macro invocations. That implies looking for or seeing them, which
is scanning for them by definition of the word "scanning."
Post by Douglas A. Gwyn
The only mention is in the
context of *re*scanning a set of newly minted replacement
tokens. Was it ever "scanned" before it was "rescanned"?
Yes, conceptually many scans (to do various things) have occurred. The tokens
that make up any give replacement list are scanned upon definition of the macro,
scanned for parameter substitution, scanned for # and ## evaluation, scanned for
placemarker removal, and ultimately scanned for more macro invocations to
replace. (Not to mention previous invocations of the same macro.) Obviously,
an implementation is very likely to condense this number of scans, but that is
merely an implementation optimization that merges several conceptual scans into
fewer
physical scans.
Post by Douglas A. Gwyn
There simply isn't enough said about the model.
There is enough to say that it is iterative, but it could certainly be more
descriptive and less disjoint.
Post by Douglas A. Gwyn
I think I now see how you are reading 6.10.3.4 paragraph 2
as saying in effect "Back up then proceed with the top-level
scan, but first plant a temporary marker behind what was
just substituted, so that the painting rules can be applied
*for the just-replaced identifier* until the marker is
reached." Taken literally that requires not only the flag
in the symbol table and a way to optionally paint identifier
tokens, but also a way to mark geographic points in the
token stream with identifiers (one level is adequate). The
implementation would actually be simpler using the recursive
model.
No, it would actually be far worse because you'd need both a flag in the symbol
table and a stack (of some sort) of nested token buffers (and associated
geographic locations) which could be accessed directly from any nested
point--the tokens following a replacement list might be needed and they could
easily be located in any number of token buffers or the "file" token stream.
Post by Douglas A. Gwyn
Post by Paul Mensonides
invocation is replaced by the replacement list before rescanning ever occurs
(which is explicit and unarguable). As such, there is no procedural nesting,
But parameter substitution, pasting, placemarker removal, etc. is
certainly part of the process of a *single* macro replacement, so
at least the first part of 6.10.3.4 is *within* the context of a
single macro replacement; it is reasonable, and consistent with
similar unarguably recursive constructions throughout the
"language" portion of the standard, to take the *whole* of 6.10.3.4
as a component of a *single* instance of the process of macro
replacement (6.10.3).
The first part says what happens when a macro expansion finishes (it removes the
placemarkers and then resumes scanning). The replacement of a macro invocation
by the replacement list in the definition, parameter substitution (etc.),
stringizing/token-pasting, and placemarker removal are indeed all steps in a
single macro invocation. Rescanning is something that happens after those steps
have completed. It is irrelevant whether it is consistent with other document
structure in the standard. Why do you keep insisting that 6.10.3 is the process
of macro replacement? It isn't; it is the definition of macros, macro
replacement, and macro scope/undefinition. The process of a macro expansion is
not described as a unit. Instead, it is described in bits and pieces is several
different places. Because of that and what the text actually says, the
subsections have no logical meaning other than "this section is a section
because its contents revolve around the concept described by its title."
Post by Douglas A. Gwyn
Post by Paul Mensonides
Okay. The only reason that I mention it is that you've made references to a
"replacement buffer" which (though is a possible implementation strategy) does
not exist in the conceptual model defined by the standard.
Well, I couldn't use "replacement list", because the standard
botches its use of that term, using it both for the body of
the macro definition (6.10.3.4 para. 1) and for the contents
of the replacement buffer (6.10.3.4 para. 2).
There is no reference to any sort of "replacement buffer" in either paragraph.
I understand what you mean, but I disagree with the use of the terminology
because it implies a recursive model. Instead, there is a stream of
preprocessing tokens (the first part of which, relative to the current scan
point, may have been produced by macro replacement rather than reading them
directly from the file stream). Ignoring arguments for a moment, this is the
only buffer that could exist, and it is just an abstraction between the
preprocessing tokens produced by macro replacement and the preprocessing tokens
produced by the lexer directly reading from the source file. Further, this is
just an implementation strategy that has no place in the conceptual model
because in the conceptual model, the entire source file was decomposed into
preprocessing tokens before any macro replacement ever occurs. With arguments,
the situation is different because even in the conceptual model, a secondary
buffer (for each argument) is needed because the preprocessor must hold both the
original argument's preprocessing tokens (in case of # and ##) and also perform
a scan for macro replacement on the argument (otherwise) which possibly produces
a different sequence of preprocessing tokens.
Post by Douglas A. Gwyn
Post by Paul Mensonides
The standard does not have any explicitly recursive wording regarding macro
expansion whatsoever.
Yes, it *does*. In the course of specifying how macro
replacement works in detail, it makes two logically
the arguments (about which we seem to agree) and macro
I agree that the model regarding arguments is recursive, but not recursive in
the sense that I believe that you mean. It is a recursive scan of all of the
argument's preprocessing tokens, not a recursive macro replacement/rescan.
Post by Douglas A. Gwyn
names found during the rescan of the replacement buffer.
The wording is thus self-referential (recursive) even
if the intended model is not.
I'll go along with this for a moment. The notion of self-reference does not
imply recursion. There are two principle forms of self-reference, one is indeed
recursion and the other is iteration. For example, the following:

while (expr) {
statements;
continue;
}

is a self-reference, but it is not recursion. Iteration is decidedly different
than recursion, even though any given use of iteration can be reimplemented
recursively and vice versa. Even if a tail-recursion is optimized such that it
has performance/space characteristics similar to that of iteration, it is still
recursion.

With that said, I'll disagree that rescanning (of the "replacement buffer" _and_
all of the rest of the following preprocessing tokens) is self-referential at
all. Immediately before rescanning begins, the process of replacing a macro
invocation has finished (though context is left behind in some form). This is
an iterative resumption, is is simply continuing what it was already doing
before the macro was invoked--i.e. it is *returning* to scanning. In other
words, scanning procedes along, invokes a macro and returns, and then continues
scanning. It is not self-referential except to say that the process of macro
replacement is over and scanning resumes.
Post by Douglas A. Gwyn
Since the nonlooping
feature introduces a wart into the pure model, it is
hard to determine which pure model is being warted.
I'm not sure what feature you mean by "nonlooping feature," nor what you refer
to by the "pure model". The entirety of 6.10.3.4 is not a subprocess of macro
replacement, any more than 6.10.3 describes the process of a macro replacement
(instead it defines the macro system).
Post by Douglas A. Gwyn
The practical difference is very slight, showing up so
far as I can see only when macro expansion produces an
identifier that has been defined as the name of a
function-like macro at the exact end of the replacement
buffer.
In this you're absolutely right. It doesn't matter one way or the other how you
view the model because they both produce the same result given a clear
specification for what "nested" means. The only reason that I bring up the
model at all is that the nature of the model shows an important distinction
between procedural nesting and contextual nesting. By procedural nesting,
"nested" could easily mean that anything that spans the end of the replacement
list is nested, but with contextual nesting, such a construct is definitely not
nested (rather it is overlapped)--unless the terminology is redefined by the
standard to mean that "contextual overlapping" is "contextual nesting" for the
purposes of that section (which it isn't).
Post by Douglas A. Gwyn
Post by Paul Mensonides
Post by Douglas A. Gwyn
I still think this whole matter boils down to whether
"nested" means procedurally or geographically.
I think that it boils down to whether partially nested is considered nested in
this context. Either view amounts to the same thing though.
...
interpretation other than the original intent regarding partial invocation
causes a horribly exponential increase in the number of macros required to
implement a general purpose component in such a framework.
I have not been convinced of that.
I can show you numerous examples (see my other post), but I don't think it will
matter. I doubt that you have the experience in the field (i.e. program
generation with the preprocessor--a.k.a. metaprogramming) and expertise in the
idioms and techniques involved. At the same time, I'm sure that you have
significantly more experience and expertise in general programming than I have.
I assure you that it matters a great deal.
Post by Douglas A. Gwyn
(I *am* easily
convinced that some programmers have produced code
that depends on a particular method of implementation.)
:) I'm not referring to a ton of little examples that could easily be rewritten
to avoid this issue. I'm referring to generic and well-structured use of the
preprocessor to generate program code in order to minimize the necessity of
secondary tools in the build process and to drastically reduce the number of
maintenance points. I'm definitely not worried about toy examples, nor am I
worried about simplistic constructs such as creating an "inline" function with a
macro (which is, in many cases, bad design).
Post by Douglas A. Gwyn
I have already noted above the only usage I can see
where it matters which interpretation is implemented,
and it doesn't seem like it is a necessary feature
#define foo(whatever) whatever
#define bar stuff foo
bar(whatever)
Why is that not
#define bar(whatever) stuff foo(whatever)
For something this trivial, of course it can be implemented in a variety of
ways. Here's a better example, say I have a WHILE loop similar to the one
defined in the other post. I can easily define a binary search process that
*finds* the next available WHILE_x iteration, making the construct appear
directly reentrant in the predicate and mutating operation. In that case, an
invocation of an object-like macro WHILE would ultimately result in the correct
WHILE_x macro:

#define WHILE FIND_THE_CORRECT_WHILE()

WHILE(PRED, OP, STATE)

Of course, this example is overkill to address the topic in question. The base
reason is that, in your simple example above, bar does not *compute* what its
result should be. This is obvious when arguments are used, because the result
is not fixed:

#define primitive_cat(a, b) a ## b

#define choose(option) \
primitive_cat(choose_, option)() \
/**/
#define choose_1() // attempt to use primitive_cat!
#define choose_2() //
#define choose_3() //

If the various choose_* cases are considered nested, then even something as
simple as a concatenation macro cannot be general purpose component in such a
framework because the level of dependence is ridiculously high.
Post by Douglas A. Gwyn
The "original intent" is a red herring; in fact the C
Standard did *not* spell out the model with the clarity
of an algorithm such as Dave's, alas, and (although I'd
have to rummage through notes long buried) there is no
particular reason to think that the text of the standard
as eventually negotiated and published was or was not
meant to prcisely mirror Dave's algorithm. In fact I
rather recall that we did make some adjustments, in the
belief that the very examples under discussion could
reasonably be done either way, and we didn't want to
precude either form of implementation.
Original intent is definitely not a red herring. When some text is unclear, the
intent that produced the text becomes relevant. Further, the text of the
standard definitely does not mirror the "intent algorithm", though it does
mirror the intent. Regarding other things said in this part of your reply, you
are saying that Dave was either lying or misremembering about what the
underlying intent and procedure was: to allow as much replacement as possible
without the danger of infinite expansion. Given Dave's work in the area, I'm
more inclined to trust his memory of the situation than yours. Either way,
"original intent of the text" is non-normative, but it does act as a valid
clarification of an unclear issue.

Regards,
Paul Mensonides
Douglas A. Gwyn
2004-03-04 20:45:17 UTC
Post by Paul Mensonides
Original intent is definitely not a red herring.
It is when you get it wrong.

Neil Booth
2004-02-29 21:32:27 UTC
Post by Douglas A. Gwyn
I'm not sure what bug your preprocessor has. During the
expansion of B in B(Barg) there is an expansion of A(Barg)
that should not further recurse on B since B is still
"being replaced" (most readily implemented as a flag in
the symbol table). The nested B resulting from expanding
A(Barg) should be flagged ("painted blue", another flag in
the symbol table) and never replaced. The result of
expanding B(Barg) should thus be "expB+expA()+B(Barg)".
If you have the source code for the preprocessor you could
try "instrumenting" it (adding carefully placed printouts)
to watch how it is making its decisions while expanding
your test code.
You should ask Dave Prosser to send you the original pseudo-
algorithm that the C89 committee agreed too. That
algorithm does not correspond to your claim above, but to
what Paul explains in his reply. The basic idea was to
allow as much expansion as possible provided you were definitely
making progress (i.e. avoiding infinite recursion). In
particular, once you start reading extra tokens from the source
file you are no longer nested in what otherwise might be
considered a nested macro.

Dave's pseudo-code is what GCC >= 3.1 as I coded implements
(though the implementation does not correspond directly to
the description), and what most production compilers attempt
to implement with varying degrees of success.

Neil.
Paul Mensonides
2004-03-01 05:30:57 UTC
Post by Neil Booth
Dave's pseudo-code is what GCC >= 3.1 as I coded implements
(though the implementation does not correspond directly to
the description), and what most production compilers attempt
to implement with varying degrees of success.
As an aside, GCC's preprocessor is one of the most conforming preprocessors that
exists today. Metrowerks new preprocessor (as of version 9) is also extremely
good (prior to version 9 it was extremely bad). Borland's separate command line
preprocessor is pretty good, but its integrated preprocessor has a nasty bug
that prevents all sorts of useful things (not related to the discussion at
hand). EDG-based preprocessors (e.g. Comeau and Intel) are okay, but they are
slow. The primary conformance problems of EDG-based preprocessors is that they
preserve context where they shouldn't (see one of my other posts), though Steve
is aware of this (and a couple other) issue(s). Both Sun and IBM preprocessors
have bugs that are similar to the Borland bug (among others). Digital Mars
needs quite a bit of work, and I hope to help with that soon. Microsoft's is
terribly non-conformant--which is, of course, no surprise :(. The Wave
preprocessor library is also very good, and the new MCPP (when in standard mode
with all the spurious warnings turned off) also seems to be very good (though I
haven't spent a great deal of time testing it yet).

I realize that preprocessor conformance (especially when the lack of conformance
only rears its head in reasonably complex or tricky scenarios) is very near to
the bottom of the priority list. GCC has done an excellent job (though I wish
that I could configure the preprocessor to emit standard line directives so I
could use it with other compilers). Metrowerks has also had a major turnaround
in this area. Thanks to both for those efforts.

Regards,
Paul Mensonides
Alexander Krotov
2004-02-29 23:58:51 UTC
Neil Booth <***@somewhere.org> wrote:
NB> You should ask Dave Prosser to send you the original pseudo-
NB> algorithm that the C89 committee agreed too. That
NB> algorithm does not correspond to your claim above, but to
NB> what Paul explains in his reply. The basic idea was to
NB> allow as much expansion as possible provided you were definitely
NB> making progress (i.e. avoiding infinite recursion). In
NB> particular, once you start reading extra tokens from the source
NB> file you are no longer nested in what otherwise might be
NB> considered a nested macro.
NB>
NB> Dave's pseudo-code is what GCC >= 3.1 as I coded implements
NB> (though the implementation does not correspond directly to
NB> the description), and what most production compilers attempt
NB> to implement with varying degrees of success.

Are you refering to same algorith as was posted to gcc-prs ?
http://gcc.gnu.org/ml/gcc-prs/2001-q1/msg00495.html

-ank