[Toybox] locale support question

Rob Landley rob at landley.net
Sat Nov 30 22:07:32 PST 2024


On 11/30/24 11:28, Ray Gardner wrote:
> Toybox main.c has this code to support UTF-8:
> 
>      // Try user's locale, but if that isn't UTF-8 merge in a UTF-8 locale's
>      // character type data. (Fall back to en_US for MacOS.)
>      setlocale(LC_CTYPE, "");
>      if (strcmp("UTF-8", nl_langinfo(CODESET)))
>        uselocale(newlocale(LC_CTYPE_MASK, "C.UTF-8", 0) ? :
>          newlocale(LC_CTYPE_MASK, "en_US.UTF-8", 0));

Which is basically result of many long arguments trying to get Android, 
MacOS, and various Linux distros (glibc and musl but also differing 
locale installation choices) to play nice with each other.

> For a standalone version of awk, I intend to use this instead:
> 
>    char *p = setlocale(LC_CTYPE, "");
>    if (!p || !strstr(p, "UTF-8")) p = setlocale(LC_CTYPE, "C.UTF-8");
>    if (!p || !strstr(p, "UTF-8")) p = setlocale(LC_CTYPE, "en_US.UTF-8");
> 
> Rationale is that this compiles on older systems that lack up to date
> locale support.

Good luck?

https://landley.net/toybox/faq.html#support_horizon

> What will be the effective difference between these? I am not familiar
> with the details of locale support in C and POSIX.

That's probably a question for Elliott. (Or possibly Rich Felker.)

According to "git annotate main.c" that code is a combination of commits 
b34ed8132, 75b89012c, and bec202875 and the dates on those commits 
incriminate various mailing list threads ala 
http://lists.landley.net/pipermail/toybox-landley.net/2020-December/028293.html 
and 
http://lists.landley.net/pipermail/toybox-landley.net/2023-February/029452.html 
and probably more. (And that's before I dig further back to see how we 
got THERE. I remember arguing about C vs C.utf8 locale support in 2013, 
because I remember what office break room I was checking email in and 
that contract only lasted 6 months...)

Rob


More information about the Toybox mailing list