[Toybox] locale support question
Rob Landley
rob at landley.net
Sat Nov 30 22:07:32 PST 2024
On 11/30/24 11:28, Ray Gardner wrote:
> Toybox main.c has this code to support UTF-8:
>
> // Try user's locale, but if that isn't UTF-8 merge in a UTF-8 locale's
> // character type data. (Fall back to en_US for MacOS.)
> setlocale(LC_CTYPE, "");
> if (strcmp("UTF-8", nl_langinfo(CODESET)))
> uselocale(newlocale(LC_CTYPE_MASK, "C.UTF-8", 0) ? :
> newlocale(LC_CTYPE_MASK, "en_US.UTF-8", 0));
Which is basically result of many long arguments trying to get Android,
MacOS, and various Linux distros (glibc and musl but also differing
locale installation choices) to play nice with each other.
> For a standalone version of awk, I intend to use this instead:
>
> char *p = setlocale(LC_CTYPE, "");
> if (!p || !strstr(p, "UTF-8")) p = setlocale(LC_CTYPE, "C.UTF-8");
> if (!p || !strstr(p, "UTF-8")) p = setlocale(LC_CTYPE, "en_US.UTF-8");
>
> Rationale is that this compiles on older systems that lack up to date
> locale support.
Good luck?
https://landley.net/toybox/faq.html#support_horizon
> What will be the effective difference between these? I am not familiar
> with the details of locale support in C and POSIX.
That's probably a question for Elliott. (Or possibly Rich Felker.)
According to "git annotate main.c" that code is a combination of commits
b34ed8132, 75b89012c, and bec202875 and the dates on those commits
incriminate various mailing list threads ala
http://lists.landley.net/pipermail/toybox-landley.net/2020-December/028293.html
and
http://lists.landley.net/pipermail/toybox-landley.net/2023-February/029452.html
and probably more. (And that's before I dig further back to see how we
got THERE. I remember arguing about C vs C.utf8 locale support in 2013,
because I remember what office break room I was checking email in and
that contract only lasted 6 months...)
Rob
More information about the Toybox
mailing list