[Toybox] locale support question
enh
enh at google.com
Sun Dec 1 18:44:41 PST 2024
On Sun, Dec 1, 2024 at 1:08 AM Rob Landley <rob at landley.net> wrote:
> On 11/30/24 11:28, Ray Gardner wrote:
> > Toybox main.c has this code to support UTF-8:
> >
> > // Try user's locale, but if that isn't UTF-8 merge in a UTF-8
> locale's
> > // character type data. (Fall back to en_US for MacOS.)
> > setlocale(LC_CTYPE, "");
> > if (strcmp("UTF-8", nl_langinfo(CODESET)))
> > uselocale(newlocale(LC_CTYPE_MASK, "C.UTF-8", 0) ? :
> > newlocale(LC_CTYPE_MASK, "en_US.UTF-8", 0));
>
> Which is basically result of many long arguments trying to get Android,
> MacOS, and various Linux distros (glibc and musl but also differing
> locale installation choices) to play nice with each other.
>
> > For a standalone version of awk, I intend to use this instead:
> >
> > char *p = setlocale(LC_CTYPE, "");
> > if (!p || !strstr(p, "UTF-8")) p = setlocale(LC_CTYPE, "C.UTF-8");
> > if (!p || !strstr(p, "UTF-8")) p = setlocale(LC_CTYPE, "en_US.UTF-8");
> >
> > Rationale is that this compiles on older systems that lack up to date
> > locale support.
>
> Good luck?
>
> https://landley.net/toybox/faq.html#support_horizon
>
> > What will be the effective difference between these? I am not familiar
> > with the details of locale support in C and POSIX.
>
> That's probably a question for Elliott. (Or possibly Rich Felker.)
>
i think it's really a question for someone who knows something about
whatever [presumably ancient] systems you're trying to support. bionic and
musl are both always utf-8, glibc and macOS let you test with
nl_langinfo(3), and i don't think i've used anything that wasn't one of
those since the 1990s...
nl_langinfo(3) has been in posix since issue 2, so i'd assume historical
systems without that also aren't going to understand _anything_ you try to
do to convince them to use utf8?
> According to "git annotate main.c" that code is a combination of commits
> b34ed8132, 75b89012c, and bec202875 and the dates on those commits
> incriminate various mailing list threads ala
>
> http://lists.landley.net/pipermail/toybox-landley.net/2020-December/028293.html
> and
>
> http://lists.landley.net/pipermail/toybox-landley.net/2023-February/029452.html
> and probably more. (And that's before I dig further back to see how we
> got THERE. I remember arguing about C vs C.utf8 locale support in 2013,
> because I remember what office break room I was checking email in and
> that contract only lasted 6 months...)
>
> Rob
> _______________________________________________
> Toybox mailing list
> Toybox at lists.landley.net
> http://lists.landley.net/listinfo.cgi/toybox-landley.net
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.landley.net/pipermail/toybox-landley.net/attachments/20241201/76bdd119/attachment.htm>
More information about the Toybox
mailing list