Discussion:
Interaction between conditional inclusion and source file inclusion.
(too old to reply)
James Kuyper
2021-12-25 06:18:30 UTC
Permalink
This discussion requires familiarity with the standard's specifications
and terminology from sections 6.10, 6.10.1 and 6.10.2. Unless you've got
those sections memorized, you might need to cross-reference them to
understand what I'm saying.

To simplify the following discussion, I'm going to write it as if the
only conditional inclusion preprocessing directives were #if, #else, and
#endif. Code using the other conditional inclusion directives can always
be rewritten to use only those three, with essentially the same
behavior, with a minor exception in the case of #elsif, where the
subsequent occurrences of __LINE__ would have increased values. Those
other directives don't change the issue I'm discussing, they only
complicate the discussion.

I've long understood that, during translation phase 4, as soon as a
compiler reaches the new-line at the end of a #if directive, it knows
whether the #if group will be included. It not, and there's a
corresponding #else, it knows that the #else group will be included.
Either way, as soon as it starts reading a group that will be included,
it can immediately start preprocessing that group (and this is the
important part:) while searching for the #else or #endif directive that
terminates the group.

I've also long understood that the #if, #else (if any) and #endif
directives that make up an if-section must all occur in the same file.
I'm not sure how I reached that conclusion - it's not anything that the
standard says explicitly.

I just recently realized that, under certain circumstances, those two
understandings are in conflict:

if.c:
#if 1
int i = 0;
#include "else.h"
int l = 3;
#endif

else.h:
int j = 1;
#else
int k = 2;

If preprocessing of the #if group could continue while searching for the
terminating #else or #endif, then that would mean that the #include
directive in if.c would be replaced by the contents of else.h, and that
the #else from else.h would therefore be recognized as terminating the
if-group from if.c, and starting a new else-group that continues until
the #endif in if.c. The declarations of `i` and `j` should be included,
and those of `k` and `l` should be skipped.

I didn't expect it to work, and my tests with gcc confirm that
expectation - but I'm having trouble identifying how the standard
specifies that this shouldn't work.

The grammar for an if-group in 6.10p1 includes the following rule:

# if constant-expression new-line group opt

This could be interpreted as meaning that the entire if-group must be
parsed as such by the compiler before carrying out the behavior
associated with that if-group, which is to process the optional group if
the constant-expression has a non-zero value. This would imply that the
#else or #endif that terminates the group must be identified as such
before replacing any #include directives that might be found in that
group with the contents of the specified file. That in turn would imply
that a #else in the included file could not qualify as that terminating
directive.

The thing is, It's not clear to me that the standard actually says so. C
was designed around the same time I started my computer programming
career, when keeping a program's memory footprint small was more
important than it is now. I've noted that, particularly with the
original version of the C standard, the language seems, for the most
part, deliberately designed to allow single-pass processing with
relatively low memory requirements, which is why I did not expect it to
require scanning for the end of a group before processing any #include
directives in that group.

Have I missed something that says this more explicitly than the grammar
rule cited above? I'm sure there are people who will tell me that the
grammar rule cited above is sufficient, because they think it makes this
point perfectly clear - but is there anyone who agrees with me that it's
not clear?
Richard Damon
2021-12-25 13:23:47 UTC
Permalink
Post by James Kuyper
This discussion requires familiarity with the standard's specifications
and terminology from sections 6.10, 6.10.1 and 6.10.2. Unless you've got
those sections memorized, you might need to cross-reference them to
understand what I'm saying.
To simplify the following discussion, I'm going to write it as if the
only conditional inclusion preprocessing directives were #if, #else, and
#endif. Code using the other conditional inclusion directives can always
be rewritten to use only those three, with essentially the same
behavior, with a minor exception in the case of #elsif, where the
subsequent occurrences of __LINE__ would have increased values. Those
other directives don't change the issue I'm discussing, they only
complicate the discussion.
I've long understood that, during translation phase 4, as soon as a
compiler reaches the new-line at the end of a #if directive, it knows
whether the #if group will be included. It not, and there's a
corresponding #else, it knows that the #else group will be included.
Either way, as soon as it starts reading a group that will be included,
it can immediately start preprocessing that group (and this is the
important part:) while searching for the #else or #endif directive that
terminates the group.
I've also long understood that the #if, #else (if any) and #endif
directives that make up an if-section must all occur in the same file.
I'm not sure how I reached that conclusion - it's not anything that the
standard says explicitly.
I just recently realized that, under certain circumstances, those two
#if 1
int i = 0;
#include "else.h"
int l = 3;
#endif
int j = 1;
#else
int k = 2;
If preprocessing of the #if group could continue while searching for the
terminating #else or #endif, then that would mean that the #include
directive in if.c would be replaced by the contents of else.h, and that
the #else from else.h would therefore be recognized as terminating the
if-group from if.c, and starting a new else-group that continues until
the #endif in if.c. The declarations of `i` and `j` should be included,
and those of `k` and `l` should be skipped.
I didn't expect it to work, and my tests with gcc confirm that
expectation - but I'm having trouble identifying how the standard
specifies that this shouldn't work.
# if constant-expression new-line group opt
This could be interpreted as meaning that the entire if-group must be
parsed as such by the compiler before carrying out the behavior
associated with that if-group, which is to process the optional group if
the constant-expression has a non-zero value. This would imply that the
#else or #endif that terminates the group must be identified as such
before replacing any #include directives that might be found in that
group with the contents of the specified file. That in turn would imply
that a #else in the included file could not qualify as that terminating
directive.
The thing is, It's not clear to me that the standard actually says so. C
was designed around the same time I started my computer programming
career, when keeping a program's memory footprint small was more
important than it is now. I've noted that, particularly with the
original version of the C standard, the language seems, for the most
part, deliberately designed to allow single-pass processing with
relatively low memory requirements, which is why I did not expect it to
require scanning for the end of a group before processing any #include
directives in that group.
Have I missed something that says this more explicitly than the grammar
rule cited above? I'm sure there are people who will tell me that the
grammar rule cited above is sufficient, because they think it makes this
point perfectly clear - but is there anyone who agrees with me that it's
not clear?
While the grammer may not be clear as to what happens between the start
and end of the if-group, the description of what happens in the block
says (6.10.1p6)

Each directive’s condition is checked in order. If it evaluates to false
(zero), the group that it controls is skipped: directives are processed
only through the name that determines the directive in order to keep
track of the level of nested conditionals; the rest of the directives’
preprocessing tokens are ignored, as are the other preprocessing tokens
in the group. Only the first group whose control condition evaluates to
true (nonzero) is processed. If none of the conditions evaluates to
true, and there is a #else directive, the group controlled by the #else
is processed; lacking a #else directive, all the groups until the #endif
are skipped.)

Thus it is clear that an #include statement within a skipped block is
not processed and thus #else and the like within the include file are
not seen.
James Kuyper
2021-12-25 16:09:31 UTC
Permalink
...
Post by Richard Damon
Post by James Kuyper
#if 1
int i = 0;
#include "else.h"
int l = 3;
#endif
int j = 1;
#else
int k = 2;
...
Post by Richard Damon
While the grammer may not be clear as to what happens between the start
and end of the if-group, the description of what happens in the block
says (6.10.1p6)
Each directive’s condition is checked in order. If it evaluates to false
(zero), the group that it controls is skipped: directives are processed
only through the name that determines the directive in order to keep
track of the level of nested conditionals; the rest of the directives’
preprocessing tokens are ignored, as are the other preprocessing tokens
in the group. Only the first group whose control condition evaluates to
true (nonzero) is processed. If none of the conditions evaluates to
true, and there is a #else directive, the group controlled by the #else
is processed; lacking a #else directive, all the groups until the #endif
are skipped.)
Thus it is clear that an #include statement within a skipped block is
not processed and thus #else and the like within the include file are
not seen.
<pedantic> The standard provides its own definitions for both "block"
and "group". Of the two, the one that is relevant here is "group", not
"block". </pedantic>

I thought I had made it clear that I was very specifically concerned
about #include directives that occur within groups that are NOT skipped.
However, when I reviewed my message while preparing this response, I see
that I failed to make that distinction - sorry! I didn't say anything
incorrect, and what I said has unexpected consequences for such a
directive, but I didn't emphasize that point. The only clue you had was
the fact that my example involved such a directive.

gcc ignored the #else directive, identifying it as a syntax error due to
having no corresponding #if. Therefore it apparently does not consider
the #if directive in if.c to be a match.

It's interesting to note that a corresponding issue cannot come up
through macro substitution. There are two reasons for this:

"If a # preprocessing token, followed by an identifier, occurs lexically
at the point at which a preprocessing directive could begin, the
identifier is not subject to macro replacement." (6.10.3p8).

and

"The resulting completely macro-replaced preprocessing token sequence is
not processed as a preprocessing directive even if it resembles one,
..." (6.10.3.4p3).

I couldn't find comparable wording to interfere with providing an #if in
one file, and the matching #else or #endif in a different file, by use
of #include directives - but gcc doesn't accept it.
Tim Rentsch
2021-12-25 16:52:04 UTC
Permalink
Post by James Kuyper
This discussion requires familiarity with the standard's specifications
and terminology from sections 6.10, 6.10.1 and 6.10.2. Unless you've got
those sections memorized, you might need to cross-reference them to
understand what I'm saying.
To simplify the following discussion, I'm going to write it as if the
only conditional inclusion preprocessing directives were #if, #else, and
#endif. Code using the other conditional inclusion directives can always
be rewritten to use only those three, with essentially the same
behavior, with a minor exception in the case of #elsif, where the
subsequent occurrences of __LINE__ would have increased values. Those
other directives don't change the issue I'm discussing, they only
complicate the discussion.
I've long understood that, during translation phase 4, as soon as a
compiler reaches the new-line at the end of a #if directive, it knows
whether the #if group will be included. It not, and there's a
corresponding #else, it knows that the #else group will be included.
Either way, as soon as it starts reading a group that will be included,
it can immediately start preprocessing that group (and this is the
important part:) while searching for the #else or #endif directive that
terminates the group.
I've also long understood that the #if, #else (if any) and #endif
directives that make up an if-section must all occur in the same file.
I'm not sure how I reached that conclusion - it's not anything that the
standard says explicitly. [...]
The first rule of grammar in 6.10 paragraph 1 says (with \sub()
to mean subscript)

preprocessing-file:
group \sub(opt)

Thus each preprocessing file must consist of an integral number
of group-part, and so cannot contain any unbalanced #if/#endif
directives, or any #else directive outside an #if/#endif section.

Note to all: Merry Christmas!
James Kuyper
2021-12-26 17:56:10 UTC
Permalink
...
Post by Tim Rentsch
Post by James Kuyper
I've long understood that, during translation phase 4, as soon as a
compiler reaches the new-line at the end of a #if directive, it knows
whether the #if group will be included. It not, and there's a
corresponding #else, it knows that the #else group will be included.
Either way, as soon as it starts reading a group that will be included,
it can immediately start preprocessing that group (and this is the
important part:) while searching for the #else or #endif directive that
terminates the group.
I've also long understood that the #if, #else (if any) and #endif
directives that make up an if-section must all occur in the same file.
I'm not sure how I reached that conclusion - it's not anything that the
standard says explicitly. [...]
The first rule of grammar in 6.10 paragraph 1 says (with \sub()
to mean subscript)
group \sub(opt)
Thus each preprocessing file must consist of an integral number
of group-part, and so cannot contain any unbalanced #if/#endif
directives, or any #else directive outside an #if/#endif section.
I believe that what you're saying, using the terms defined in the C
preprocessing grammar, is that neither an if-group, an else-group,
nor a endif-line qualifies separately as a group-part, only a complete
if-section can do so.

When the standard defines the meaning of a term, that definition takes
precedence over any other interpretation you might reach by analyzing
the meaning of the words making up that term. "preprocessing-file" is
simply a symbol in the grammar - it's definition is the grammar rule
associated with that symbol.

I've always interpreted the specification given in 6.10.2 as meaning that
a given preprocessing file must match the grammar described in 6.10 up
until the point that it recognizes a #include directive, which 'causes the
replacement of that directive by the entire contents of the source file
identified by the specified sequence between the " delimiters.' It's only the
file after that replacement (and all other such replacements), which must
fully parse in accordance with the grammar in 6.10.

However, the term "preprocessing file" is also defined in 5.1.1.1p1. That's
a section of the standard that seldom comes up in discussion, so I'd
forgotten about that definition. I agree that it makes sense that a
"preprocessing file" is meant to match the syntax specified for a
"preprocessing-file". The standard often uses a grammar symbol name,
with '-' replaced by spaces, to refer to things matching that grammar
symbol. However, this is one of the few places where the name, with that
replacement, is formally defined separately from the grammar, implying a
connection between those two definitions.

This is not the clearest way to impose such a requirement. If each
preprocessing file is supposed to separately parse as a preprocessing-file,
I think it would have been better to explicitly mention that fact in the
description of 6.10.2 "Source file Inclusion." The "replacement" wording
actually used gave me the strong impression that there were no content
restrictions on the #included file itself, but only on the result after
replacing the directive with those contents.
Tim Rentsch
2022-01-17 14:47:39 UTC
Permalink
Post by James Kuyper
...
Post by Tim Rentsch
Post by James Kuyper
I've long understood that, during translation phase 4, as soon as
a compiler reaches the new-line at the end of a #if directive, it
knows whether the #if group will be included. It not, and there's
a corresponding #else, it knows that the #else group will be
included. Either way, as soon as it starts reading a group that
will be included, it can immediately start preprocessing that
group (and this is the important part:) while searching for the
#else or #endif directive that terminates the group.
I've also long understood that the #if, #else (if any) and #endif
directives that make up an if-section must all occur in the same
file. I'm not sure how I reached that conclusion - it's not
anything that the standard says explicitly. [...]
The first rule of grammar in 6.10 paragraph 1 says (with \sub()
to mean subscript)
group \sub(opt)
Thus each preprocessing file must consist of an integral number
of group-part, and so cannot contain any unbalanced #if/#endif
directives, or any #else directive outside an #if/#endif section.
I believe that what you're saying, using the terms defined in the C
preprocessing grammar, is that neither an if-group, an else-group,
nor a endif-line qualifies separately as a group-part, only a
complete if-section can do so.
That isn't what I was saying, although I expect it is a true
statement. What I was saying is that all the expansions of
group-part are balanced with respect to #if/#endif directives,
and also limit #else directives to be inside #if/#endif
segments.
Post by James Kuyper
When the standard defines the meaning of a term, that definition
takes precedence over any other interpretation you might reach by
analyzing the meaning of the words making up that term.
"preprocessing-file" is simply a symbol in the grammar - it's
definition is the grammar rule associated with that symbol.
The possesive form of "it" is "its", with no apostrophe. The
word "it's" is a contraction for "it is".

It may help to remember that the same rule applies to all
personal pronouns: an apostrophe always indicates a
contraction with "am", "is" or "are" -

I am, we are - I'm, we're
you are - you're
he is, she is, it is, they are - he's, she's, it's, they're

and there is never an apostrophe in the possessive form of a
personal pronoun -

my, our
your
his, hers, its, their
Post by James Kuyper
I've always interpreted the specification given in 6.10.2 as meaning
that a given preprocessing file must match the grammar described in
6.10 up until the point that it recognizes a #include directive,
which 'causes the replacement of that directive by the entire
contents of the source file identified by the specified sequence
between the " delimiters.' It's only the file after that replacement
(and all other such replacements), which must fully parse in
accordance with the grammar in 6.10.
However, the term "preprocessing file" is also defined in 5.1.1.1p1.
That's a section of the standard that seldom comes up in discussion,
so I'd forgotten about that definition. I agree that it makes sense
that a "preprocessing file" is meant to match the syntax specified
for a "preprocessing-file". The standard often uses a grammar
symbol name, with '-' replaced by spaces, to refer to things
matching that grammar symbol. However, this is one of the few
places where the name, with that replacement, is formally defined
separately from the grammar, implying a connection between those two
definitions.
My reading of the C standard is that "preprocessing file",
"source file", and "preprocessing-file" all mean the same thing.
A #include directive (in the "" form) "shall identify a [...]
source file", and thus the content of the file being #include'd
must match the syntax of "preprocessing-file".
Post by James Kuyper
This is not the clearest way to impose such a requirement. If
each preprocessing file is supposed to separately parse as a
preprocessing-file, I think it would have been better to
explicitly mention that fact in the description of 6.10.2
"Source file Inclusion." The "replacement" wording actually
used gave me the strong impression that there were no content
restrictions on the #included file itself, but only on the
result after replacing the directive with those contents.
It seems other people don't have any problem understanding what
the requirements are here. Do you have any ideas about why you
do when others don't?

Loading...