[Toybox] locale support question
Ray Gardner
raygard at gmail.com
Mon Dec 2 16:46:14 PST 2024
On Sun, Dec 1, 2024 at 7:44 PM enh <enh at google.com> wrote:
> On Sun, Dec 1, 2024 at 1:08 AM Rob Landley <rob at landley.net> wrote:
>>
>> On 11/30/24 11:28, Ray Gardner wrote:
>> > Toybox main.c has this code to support UTF-8:
>> >
>> > // Try user's locale, but if that isn't UTF-8 merge in a UTF-8 locale's
>> > // character type data. (Fall back to en_US for MacOS.)
>> > setlocale(LC_CTYPE, "");
>> > if (strcmp("UTF-8", nl_langinfo(CODESET)))
>> > uselocale(newlocale(LC_CTYPE_MASK, "C.UTF-8", 0) ? :
>> > newlocale(LC_CTYPE_MASK, "en_US.UTF-8", 0));
>>
>> Which is basically result of many long arguments trying to get Android,
>> MacOS, and various Linux distros (glibc and musl but also differing
>> locale installation choices) to play nice with each other.
>>
>> > For a standalone version of awk, I intend to use this instead:
>> >
>> > char *p = setlocale(LC_CTYPE, "");
>> > if (!p || !strstr(p, "UTF-8")) p = setlocale(LC_CTYPE, "C.UTF-8");
>> > if (!p || !strstr(p, "UTF-8")) p = setlocale(LC_CTYPE, "en_US.UTF-8");
>> >
>> > Rationale is that this compiles on older systems that lack up to date
>> > locale support.
>>
>> Good luck?
>>
>> https://landley.net/toybox/faq.html#support_horizon
>>
>> > What will be the effective difference between these? I am not familiar
>> > with the details of locale support in C and POSIX.
>>
>> That's probably a question for Elliott. (Or possibly Rich Felker.)
> i think it's really a question for someone who knows something about whatever [presumably ancient] systems you're trying to support. bionic and musl are both always utf-8, glibc and macOS let you test with nl_langinfo(3), and i don't think i've used anything that wasn't one of those since the 1990s...
> nl_langinfo(3) has been in posix since issue 2, so i'd assume historical systems without that also aren't going to understand _anything_ you try to do to convince them to use utf8?
Thanks to you & Rob for the info. The old systems may not understand
utf8 but I'm just trying to get them to compile the code, even if they
don't have utf8 support. This may be for naught, as I currently require
C99 and some of them may not have that. At least setlocale() is in C89.
I may try to revert to C89 compatibility, but Rob probably won't accept
the mods needed for that, as they'll probably make the code a bit
longer.
Ray
More information about the Toybox
mailing list