James Kuyper
2021-12-25 06:18:30 UTC
This discussion requires familiarity with the standard's specifications
and terminology from sections 6.10, 6.10.1 and 6.10.2. Unless you've got
those sections memorized, you might need to cross-reference them to
understand what I'm saying.
To simplify the following discussion, I'm going to write it as if the
only conditional inclusion preprocessing directives were #if, #else, and
#endif. Code using the other conditional inclusion directives can always
be rewritten to use only those three, with essentially the same
behavior, with a minor exception in the case of #elsif, where the
subsequent occurrences of __LINE__ would have increased values. Those
other directives don't change the issue I'm discussing, they only
complicate the discussion.
I've long understood that, during translation phase 4, as soon as a
compiler reaches the new-line at the end of a #if directive, it knows
whether the #if group will be included. It not, and there's a
corresponding #else, it knows that the #else group will be included.
Either way, as soon as it starts reading a group that will be included,
it can immediately start preprocessing that group (and this is the
important part:) while searching for the #else or #endif directive that
terminates the group.
I've also long understood that the #if, #else (if any) and #endif
directives that make up an if-section must all occur in the same file.
I'm not sure how I reached that conclusion - it's not anything that the
standard says explicitly.
I just recently realized that, under certain circumstances, those two
understandings are in conflict:
if.c:
#if 1
int i = 0;
#include "else.h"
int l = 3;
#endif
else.h:
int j = 1;
#else
int k = 2;
If preprocessing of the #if group could continue while searching for the
terminating #else or #endif, then that would mean that the #include
directive in if.c would be replaced by the contents of else.h, and that
the #else from else.h would therefore be recognized as terminating the
if-group from if.c, and starting a new else-group that continues until
the #endif in if.c. The declarations of `i` and `j` should be included,
and those of `k` and `l` should be skipped.
I didn't expect it to work, and my tests with gcc confirm that
expectation - but I'm having trouble identifying how the standard
specifies that this shouldn't work.
The grammar for an if-group in 6.10p1 includes the following rule:
# if constant-expression new-line group opt
This could be interpreted as meaning that the entire if-group must be
parsed as such by the compiler before carrying out the behavior
associated with that if-group, which is to process the optional group if
the constant-expression has a non-zero value. This would imply that the
#else or #endif that terminates the group must be identified as such
before replacing any #include directives that might be found in that
group with the contents of the specified file. That in turn would imply
that a #else in the included file could not qualify as that terminating
directive.
The thing is, It's not clear to me that the standard actually says so. C
was designed around the same time I started my computer programming
career, when keeping a program's memory footprint small was more
important than it is now. I've noted that, particularly with the
original version of the C standard, the language seems, for the most
part, deliberately designed to allow single-pass processing with
relatively low memory requirements, which is why I did not expect it to
require scanning for the end of a group before processing any #include
directives in that group.
Have I missed something that says this more explicitly than the grammar
rule cited above? I'm sure there are people who will tell me that the
grammar rule cited above is sufficient, because they think it makes this
point perfectly clear - but is there anyone who agrees with me that it's
not clear?
and terminology from sections 6.10, 6.10.1 and 6.10.2. Unless you've got
those sections memorized, you might need to cross-reference them to
understand what I'm saying.
To simplify the following discussion, I'm going to write it as if the
only conditional inclusion preprocessing directives were #if, #else, and
#endif. Code using the other conditional inclusion directives can always
be rewritten to use only those three, with essentially the same
behavior, with a minor exception in the case of #elsif, where the
subsequent occurrences of __LINE__ would have increased values. Those
other directives don't change the issue I'm discussing, they only
complicate the discussion.
I've long understood that, during translation phase 4, as soon as a
compiler reaches the new-line at the end of a #if directive, it knows
whether the #if group will be included. It not, and there's a
corresponding #else, it knows that the #else group will be included.
Either way, as soon as it starts reading a group that will be included,
it can immediately start preprocessing that group (and this is the
important part:) while searching for the #else or #endif directive that
terminates the group.
I've also long understood that the #if, #else (if any) and #endif
directives that make up an if-section must all occur in the same file.
I'm not sure how I reached that conclusion - it's not anything that the
standard says explicitly.
I just recently realized that, under certain circumstances, those two
understandings are in conflict:
if.c:
#if 1
int i = 0;
#include "else.h"
int l = 3;
#endif
else.h:
int j = 1;
#else
int k = 2;
If preprocessing of the #if group could continue while searching for the
terminating #else or #endif, then that would mean that the #include
directive in if.c would be replaced by the contents of else.h, and that
the #else from else.h would therefore be recognized as terminating the
if-group from if.c, and starting a new else-group that continues until
the #endif in if.c. The declarations of `i` and `j` should be included,
and those of `k` and `l` should be skipped.
I didn't expect it to work, and my tests with gcc confirm that
expectation - but I'm having trouble identifying how the standard
specifies that this shouldn't work.
The grammar for an if-group in 6.10p1 includes the following rule:
# if constant-expression new-line group opt
This could be interpreted as meaning that the entire if-group must be
parsed as such by the compiler before carrying out the behavior
associated with that if-group, which is to process the optional group if
the constant-expression has a non-zero value. This would imply that the
#else or #endif that terminates the group must be identified as such
before replacing any #include directives that might be found in that
group with the contents of the specified file. That in turn would imply
that a #else in the included file could not qualify as that terminating
directive.
The thing is, It's not clear to me that the standard actually says so. C
was designed around the same time I started my computer programming
career, when keeping a program's memory footprint small was more
important than it is now. I've noted that, particularly with the
original version of the C standard, the language seems, for the most
part, deliberately designed to allow single-pass processing with
relatively low memory requirements, which is why I did not expect it to
require scanning for the end of a group before processing any #include
directives in that group.
Have I missed something that says this more explicitly than the grammar
rule cited above? I'm sure there are people who will tell me that the
grammar rule cited above is sufficient, because they think it makes this
point perfectly clear - but is there anyone who agrees with me that it's
not clear?