Discussion:
Can the new generic string functions accept void* arguments?
(too old to reply)
Keith Thompson
2023-06-02 04:41:03 UTC
Permalink
The latest draft of the upcoming C23 standard is:
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3096.pdf
It introduces several type-generic functions in <string.h>, replacing
normal functions of the same names: memchr, strchr, strpbrk, strrchr,
strstr.

I'll use strchr() as an example; the same applies to the other str*()
generic functions (but not to memchr()).

The problem this solves is that calling strchr() with a const char*
argument yields a non-const char* result that points into the array.
For example:

#include <stdio.h>
#include <string.h>
int main(void) {
const char s[] = "hello";
char *p = strchr(s, 'h');
*p = 'J'; // Undefined behavior
printf("%s\n", s); // Likely to print "Jello"
}

This makes it possible to obtain a non-const pointer to a const object
without a pointer cast.

The C23 strchr() generic function returns a char* if the first argument
is a char*, or a const char* if the first argument is a const char*.

The stateless search functions in this section (memchr, strchr,
strpbrk, strrchr, strstr) are *generic functions*. These functions
are generic in the qualification of the array to be searched and
will return a result pointer to an element with the same
qualification as the passed array. If the array to be searched is
const-qualified, the result pointer will be to a const-qualified
element. If the array to be searched is not const-qualified, the
result pointer will be to an unqualified element.

So far so good, and I definitely approve of this change. It does break
code that calls strchr() with a const char* argument and assigns the
result to a (non-const) char* object. That's IMHO a minor issue, and
arguably breaking such code is part of the point of the change. (Making
string literals const would be similar, but I suppose that's still a
bridge too far.)

But I've thought of away in which this could break some existing valid
code, namely code that passes a void* or const void* argument to
strchr().

Currently, since void* can be implicitly converted to char* and vice
versa, such a call is valid. (I can't think of a *good* reason to write
such a call, but my imagination is not unlimited.)

Question: Is this a valid call in C23? (It's valid in C17.)

char hello[] = "hello";
void *p = strchr((void*)hello, 'h');

An implementation of the generic strchr() will presumably use a generic
selection in a macro definition. If the generic selection covers only
types char* and const char*, the call will violate a constraint. If it
also covers void* and const void*, the call will be valid.

The current wording in N3096 suggests that only char* and const char*
are covered, implying that a call with a void* or const void* argument
is a constraint violation.

I suggest that the C23 standard should specify whether void* arguments
are valid or not. I have a slight preference for making them valid. If
so, the simplest approach would be for strchr() to return a char* given
a char* or void* argument, or a const char* given a const char* or const
void* argument.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
Will write code for food.
void Void(void) { Void(); } /* The recursive call of the void */
Keith Thompson
2023-06-02 05:18:40 UTC
Permalink
Post by Keith Thompson
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3096.pdf
It introduces several type-generic functions in <string.h>, replacing
normal functions of the same names: memchr, strchr, strpbrk, strrchr,
strstr.
I'll use strchr() as an example; the same applies to the other str*()
generic functions (but not to memchr()).
[...]

Just after I posted the above, I thought of a potential issue with
memchr() that just might affect real code.

In C17 and earlier, memchr() has this declaration:

void *memchr(const void *s, int c, size_t n);

Given the implicit conversions between void* and other object pointer
types, the first argument can be a pointer to any const object type.
This is something that might plausibly be used in practice, unlike
(I think) passing a void pointer to the str*() functions.

It's probably impractical to fix this, since it would require
the generic selection to cover all possible object pointer types.
Any code that depends on the current behavior would have to add
(void*) or (const void*) casts to ensure that the type actually
matches.

For example, this (contrived) program is valid in C17 and earlier:

#include <stdio.h>
#include <string.h>
int main(void) {
const unsigned u = 0x12345678;
printf("u = 0x%x", u);
unsigned char *p = memchr(&u, 0x34, sizeof u);
if (p != NULL) printf(", p points to 0x%x", *p);
putchar('\n');
}

The output is:

u = 0x12345678, p points to 0x34

(Conceivably p might be a null pointer if unsigned int has padding
bits that cause 0x34 not to be stored in a single byte.)

A call to memchr with a char* argument is, I suspect, more likely to
appear in real code.

The underlying issue is that the implicit conversions that happen with
function arguments do not happen with operands of a generic selection.
(The generic functions in <tgmath.h> are defined in a way that this
isn't an issue, as far as I can tell.)
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
Will write code for food.
void Void(void) { Void(); } /* The recursive call of the void */
Jakob Bohm
2023-06-02 13:03:00 UTC
Permalink
Post by Keith Thompson
Post by Keith Thompson
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3096.pdf
It introduces several type-generic functions in <string.h>, replacing
normal functions of the same names: memchr, strchr, strpbrk, strrchr,
strstr.
I'll use strchr() as an example; the same applies to the other str*()
generic functions (but not to memchr()).
[...]
Just after I posted the above, I thought of a potential issue with
memchr() that just might affect real code.
void *memchr(const void *s, int c, size_t n);
Given the implicit conversions between void* and other object pointer
types, the first argument can be a pointer to any const object type.
This is something that might plausibly be used in practice, unlike
(I think) passing a void pointer to the str*() functions.
It's probably impractical to fix this, since it would require
the generic selection to cover all possible object pointer types.
Any code that depends on the current behavior would have to add
(void*) or (const void*) casts to ensure that the type actually
matches.
#include <stdio.h>
#include <string.h>
int main(void) {
const unsigned u = 0x12345678;
printf("u = 0x%x", u);
unsigned char *p = memchr(&u, 0x34, sizeof u);
if (p != NULL) printf(", p points to 0x%x", *p);
putchar('\n');
}
u = 0x12345678, p points to 0x34
(Conceivably p might be a null pointer if unsigned int has padding
bits that cause 0x34 not to be stored in a single byte.)
A call to memchr with a char* argument is, I suspect, more likely to
appear in real code.
The underlying issue is that the implicit conversions that happen with
function arguments do not happen with operands of a generic selection.
(The generic functions in <tgmath.h> are defined in a way that this
isn't an issue, as far as I can tell.)
Would the ability of the (new) generic mechanism to choose among a short
prioritized list of types, combined with a rule that all the argument
promotion rules continue to apply to the selection solve the conundrum?
This is what typically happens with C++ overloads done for the same
purposes.

So if the generic declaration gives the priority list [char*, const
char* ], then non-const pointers compatible with char* formal argument
types will get selected first and return a non-const char*, while other
pointers compatbile with const char* formal arguments will be selected
second and return a const char*. This would even work if the generic
declaration also covered the wchar_t and related types, omitting
whichever of UTF-16/UCS-4 is equivalent to the implementation defined
wchar_t* .

Sorry for not having a copy of the new syntax handy.


Enjoy

Jakob
--
Jakob Bohm, CIO, Partner, WiseMo A/S. https://www.wisemo.com
Transformervej 29, 2860 Søborg, Denmark. Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded
Keith Thompson
2023-06-02 18:52:38 UTC
Permalink
Post by Jakob Bohm
Post by Keith Thompson
Post by Keith Thompson
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3096.pdf
It introduces several type-generic functions in <string.h>, replacing
normal functions of the same names: memchr, strchr, strpbrk, strrchr,
strstr.
I'll use strchr() as an example; the same applies to the other str*()
generic functions (but not to memchr()).
[...]
Just after I posted the above, I thought of a potential issue with
memchr() that just might affect real code.
void *memchr(const void *s, int c, size_t n);
Given the implicit conversions between void* and other object
pointer
types, the first argument can be a pointer to any const object type.
This is something that might plausibly be used in practice, unlike
(I think) passing a void pointer to the str*() functions.
It's probably impractical to fix this, since it would require
the generic selection to cover all possible object pointer types.
Any code that depends on the current behavior would have to add
(void*) or (const void*) casts to ensure that the type actually
matches.
#include <stdio.h>
#include <string.h>
int main(void) {
const unsigned u = 0x12345678;
printf("u = 0x%x", u);
unsigned char *p = memchr(&u, 0x34, sizeof u);
if (p != NULL) printf(", p points to 0x%x", *p);
putchar('\n');
}
u = 0x12345678, p points to 0x34
(Conceivably p might be a null pointer if unsigned int has padding
bits that cause 0x34 not to be stored in a single byte.)
A call to memchr with a char* argument is, I suspect, more likely to
appear in real code.
The underlying issue is that the implicit conversions that happen with
function arguments do not happen with operands of a generic selection.
(The generic functions in <tgmath.h> are defined in a way that this
isn't an issue, as far as I can tell.)
Would the ability of the (new) generic mechanism to choose among a
short prioritized list of types, combined with a rule that all the
argument promotion rules continue to apply to the selection solve the
conundrum?
This is what typically happens with C++ overloads done for the same
purposes.
I don't think so.

Generic selections (_Generic) have been in the language since C11. The
issue is that the argument promotion rules do *not* apply. C23 adds a
new use for them for several functions declared in <string.h>. (And the
corresponding functions in <wchar.h>; I had forgotten about those.)
Post by Jakob Bohm
So if the generic declaration gives the priority list [char*, const
char* ], then non-const pointers compatible with char* formal argument
types will get selected first and return a non-const char*, while other
pointers compatbile with const char* formal arguments will be selected
second and return a const char*. This would even work if the generic
declaration also covered the wchar_t and related types, omitting
whichever of UTF-16/UCS-4 is equivalent to the implementation defined
wchar_t* .
The str*() generic functions can be handled by accepting:
char* // selects function returning char*
const char* // selects function returning const char*
void* // selects function returning char*
const void* // selects function returning const char*

For memchr(), which currently takes an argument of type const
void*, I don't think there's any way for the new generic function
to accept arguments of all pointer-to-[const]-object types (without
adding a new language mechanism, which isn't practical this late
in the process). It would be possible to accept pointers to char,
signed char, and unsigned char as well as pointers to void, which
might handle most of the existing cases, but it might be cleaner
just to require a pointer to void (which could break some existing
valid code).
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
Will write code for food.
void Void(void) { Void(); } /* The recursive call of the void */
Ben Bacarisse
2023-06-02 21:01:38 UTC
Permalink
Post by Keith Thompson
Post by Keith Thompson
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3096.pdf
It introduces several type-generic functions in <string.h>, replacing
normal functions of the same names: memchr, strchr, strpbrk, strrchr,
strstr.
I'll use strchr() as an example; the same applies to the other str*()
generic functions (but not to memchr()).
[...]
Just after I posted the above, I thought of a potential issue with
memchr() that just might affect real code.
void *memchr(const void *s, int c, size_t n);
Given the implicit conversions between void* and other object pointer
types, the first argument can be a pointer to any const object type.
This is something that might plausibly be used in practice, unlike
(I think) passing a void pointer to the str*() functions.
It's probably impractical to fix this, since it would require
the generic selection to cover all possible object pointer types.
There may be a way round that... This trick converts any object pointer
to a const void * or a void * depending on the qualifiers of the object
pointer:

#include <stdio.h>

#ifndef T
#define T const int
#endif

int main(void)
{
T i;
puts(_Generic((1 ? &i : (void *)&(int){0}),
void *: "void *",
const void *: "const void *",
default: "other"));
}

(Compile with -DT=int for example to test the other case.)

Taking the address of (int){0} is simply a way to get a void * that is
not a null pointer constant. One could, in a macro taking pointer, just
use

(1 ? (p) (void *)(p))

but some compilers will warn that the cast discards the const even
though the overall effect of the expression is to keep it.
--
Ben.
Keith Thompson
2023-06-05 04:37:00 UTC
Permalink
Post by Ben Bacarisse
Post by Keith Thompson
Post by Keith Thompson
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3096.pdf
It introduces several type-generic functions in <string.h>, replacing
normal functions of the same names: memchr, strchr, strpbrk, strrchr,
strstr.
I'll use strchr() as an example; the same applies to the other str*()
generic functions (but not to memchr()).
[...]
Just after I posted the above, I thought of a potential issue with
memchr() that just might affect real code.
void *memchr(const void *s, int c, size_t n);
Given the implicit conversions between void* and other object pointer
types, the first argument can be a pointer to any const object type.
This is something that might plausibly be used in practice, unlike
(I think) passing a void pointer to the str*() functions.
It's probably impractical to fix this, since it would require
the generic selection to cover all possible object pointer types.
There may be a way round that... This trick converts any object pointer
to a const void * or a void * depending on the qualifiers of the object
#include <stdio.h>
#ifndef T
#define T const int
#endif
int main(void)
{
T i;
puts(_Generic((1 ? &i : (void *)&(int){0}),
void *: "void *",
const void *: "const void *",
default: "other"));
}
(Compile with -DT=int for example to test the other case.)
Taking the address of (int){0} is simply a way to get a void * that is
not a null pointer constant. One could, in a macro taking pointer, just
use
(1 ? (p) (void *)(p))
but some compilers will warn that the cast discards the const even
though the overall effect of the expression is to keep it.
I think you're right. I'll pass it on to the editors.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
Will write code for food.
void Void(void) { Void(); } /* The recursive call of the void */
Keith Thompson
2023-06-05 04:50:55 UTC
Permalink
[SNIP]
Post by Keith Thompson
I think you're right. I'll pass it on to the editors.
Ben, I Cc'ed you on the email and got a bounce indicating that your
mailbox is full. Everyone else, sorry about the off-topic noise.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
Will write code for food.
void Void(void) { Void(); } /* The recursive call of the void */
Keith Thompson
2023-06-02 22:11:30 UTC
Permalink
Post by Keith Thompson
Post by Keith Thompson
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3096.pdf
It introduces several type-generic functions in <string.h>, replacing
normal functions of the same names: memchr, strchr, strpbrk, strrchr,
strstr.
I'll use strchr() as an example; the same applies to the other str*()
generic functions (but not to memchr()).
[...]
Just after I posted the above, I thought of a potential issue with
memchr() that just might affect real code.
[snip]

And I think I've found an even more serious issue with bsearch().

In C17 and earlier, bsearch() is declared as:

void *bsearch(const void *key, const void *base,
size_t nmemb, size_t size,
int (*compar)(const void *, const void *));

`base` points to the object being searched. The returned value is a
pointer to non-const void pointing to an element of the searched object.

C23 (as of N3096) has:

QVoid *bsearch(const void *key, QVoid *base, size_t nmemb, size_t size,
int (*compar)(const void *, const void *));

where QVoid is either void or const void, depending on the type of the
base argument.

The obvious implementation using _Generic will reject a base argument of
a type other than `void* or `const void*`.

(I see Ben posted a followup with a possible solution. I haven't
studied it yet.)

I've been discussing this by email with the editors (listed on the first
page of N3096).

https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3096.pdf
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
Will write code for food.
void Void(void) { Void(); } /* The recursive call of the void */
Loading...