Post by James Kuyper Post by jacob navia
The strtold function should look for a sequence of letters "NAN" or
"INF" and return a corresponding float IGNORING CASE.
But what does that mean in languages that do not have "case"?
C doesn't care about human languages - the closest it comes to doing so
is caring about the locale.
No matter what the locale is, the characters "AFINTYafinty" are part of
the basic character set, and must therefore be representable in both the
source character set and the execution character set. (5.2.1p1) Korean,
chinese, and hebrew characters can only be part of the extended
character set, not the basic character set.
The terms "uppercase letter" and "lowercase letter" are defined for the
Latin alphabet in 5.2.1p3, and they apply to each of the letters in
"AFINTY" and "afinty", respectively, regardless of locale. The behavior
of isupper() (188.8.131.52), islower() (184.108.40.206), toupper() (220.127.116.11) and
tolower() (18.104.22.168) are tied to those definitions.
Post by jacob navia
Are those 3 letter sequences hardwired to english (then the correct way
would be to translate with wcstombs into a buffer, then compare ignoring
case) but that supposes that programmer's keyboards in Korea can write
"NAN" and "INF" and that all supporing software accepts latin letters
and displays them correctly, etc...
Any platform where you can create C code must have those capabilities.
That doesn't rule out the possibility of cross-compiling for a platform
lacking those capabilities. However, I think it's not an issue worth
worrying about. A platform where none of those characters can be typed
couldn't even handle base==11, which presents an even more fundamental
problem for strtold().
Post by jacob navia
Shouldn't be those letters part of a "locale" convention?
If base == 36, all of the letters from 'a'-'z' and 'A' - 'Z' are
supposed to recognized as valid parts of the subject sequence. If
base==16, 'x' and 'X' must be recognized if part of the hexadecimal
prefix, and 'e', 'E', 'p' and 'P' must be recognized as distinguishing
the exponent..None of those letters are supposed to be interpreted in a
locale-dependent fashion, so I don't see a need to treat NAN or INF
If applicable, one might (in a future standard edition) extend this by
referring to the case-mapping work done by the UNICODE committee and
standard, with the additional caveat that where available, such a future
C standard should explicitly require those functions to recognize
"lowercase dotless I" and "Uppercase I with dot above" as
case-independent equivalents of "latin letter I" even outside the
Turkish locales. Similarly such a future standard might require all
implementation parts to recognize the special "single-width" and
"double-width" code points for each character of the basic character
set, if those code points exist in the extended character set.
Such a future standard might also require implementations to recognize
any special extended character set code points that directly represent
INF or NAN, such as the traditional infinity mathematical symbol which
exists in many extended character sets.
This would all involve a balance between keeping those functions small
and simple and catering to extended character set support. Many C
programs implicitly rely on those functions to *only* accept the
ASCII code points when called with untrusted outside data, and might
suffer unpleasant failures if invalid machine-machine messages are
suddenly recognized as something else.
Thus, in general, locale sensitive or otherwise extended-char-accepting
runtime function variants should preferably be given new names such as
loc_strtold() or uni_strtold() (with the former being locale dependent,
and the latter using an international consistent UNICODE
interpretation chosen by the C standard).
Jakob Bohm, CIO, Partner, WiseMo A/S. https://www.wisemo.com
Transformervej 29, 2860 Søborg, Denmark. Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded