[Toybox] [PATCH] Fix wcwidth(3) on Mac.

enh enh at google.com
Thu Dec 5 16:01:52 PST 2019


On Sat, Nov 23, 2019 at 8:15 AM Rob Landley <rob at landley.net> wrote:
>
> On 11/22/19 4:37 PM, enh via Toybox wrote:
> > The Mac doesn't support "C.UTF-8"
>
> Of course not. Sigh.

it seems to have a more BSD-like locale setup with actual files.
`locale -a` on a glibc system also doesn't show C.UTF-8, but i assume
they do what bionic does and just chop off any ".UTF-8" and say "fine
by me".

> > (and toybox ignores the setlocale(3) failure),
>
> The problem is what would I _do_ with it? ("Never test for an error condition
> you don't know how to handle." - somebody named Steinbach.)

i was going to say "error out, obviously", but coreutils agrees with
you and blindly carries on:

~$ LANG=wtf_WTF.zawgyi ls /c
ls: cannot access '/c': No such file or directory

> > leading to numerous test failures because wcwidth(3)
> > always returns -1 for non-ASCII in the default C locale.
>
> Alas, an area where my desire to support and my ability to test are far apart.
>
> > Our choices are:
> > 1. Always use "" (the environment).
> > 2. Always use "en_US.UTF-8" (the closest to "C.UTF-8" supported on Macs).
> > 3. Continue to use "C.UTF-8" elsewhere but one of the above on Macs.
> >
> > I've gone with #1 because it's the default advice for setlocale(3),
> > and it's probably closest to the user's intentions. Every OS I'm
> > aware of has $LANG set to <something>.UTF-8 these days, and has
> > done for a decade now.
>
> Sigh. I was trying to force utf8 support because there's no reason not to do
> that these days, and due to the stupid way localization was introduced there's
> no way to enable utf8 without specifying "I mean _german_ UTF8, just like I
> meant german ascii"...
>
> > Note though that we'd also need to change LC_CTYPE to LC_ALL if we
> > also want strerror(3) to be localized. For now, despite this
> > setlocale(3) call, you'll still get "C" locale error messages.
>
> While I do explicitly want localized error messages, the various "insert commas
> into printed numbers and don't consider . a decimal separator" behavioral
> changes seem like bugs in waiting (I don't even know what all the side effects
> _are_ let alone have regression tests for them).
>
> Which of the 8 gazillion locale micromanagements is "strerr output" under?
>
> > ---
> >  main.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
>
> Hmmm... maybe it can do this as the fallback?
>
>   if (!setlocale("c.utf8")) setlocale("");

yeah, that makes sense to me and fixes the mac tests.

patch sent separately.



we can worry about folks wanting

ls: cannot access '/c': 그런 파일이나 디렉터리가 없습니다

if/when we have someone actually ask for it. (note that even with
coreutils i failed to get it to _only_ speak Korean to me, with no
English.)

> (This is _so_ not my area of expertise, happy to be corrected here. But I want
> utf8 support in initramfs or booting from a rescue USB stick where I haven't got
> a full environment set up yet, which is why it was doing that in the first place...)
>
> Rob



More information about the Toybox mailing list