[Toybox] [PATCH] lib/lib human_readable_long fix utf-8 LC_NUMERIC

Wed Sep 9 23:24:32 PDT 2020

On 9/9/20 7:19 PM, enh via Toybox wrote:
> don't apps need libc localization? not really. the POSIX localization
> functionality is so anaemic that it's really not useful even for "major
> minority" languages.

I try to have strerror() display the error codes (but still think it's a missed
opportunity that the "C" locale doesn't output EPERM and friends as the actual
strings), and keep my error message vocabulary small and simple. I also try to
preserve and display utf8 input for usernames and filenames and such.

Beyond that, I've stayed away from internationalization up until now, and if
your response is "kill it with fire" I can revert it.

> if you're serious about localization, you're going to need
> icu4c anyway, which isn't scared to embrace all the diversity that's
> actually out there (rather than the tiny subset that the POSIX folks could
> imagine, which doesn't even stretch to the need for the genitive case in dates,
> to pick one random fairly mainstream example).

Nope. Not going there.

I vaguely intend to have toysh command line editing handle right-to-left mode
due to a completionist streak, and back when I was planning on implementing vi
by vertically stacking the line editing plumbing (hence "linestack.c") I was
gonna make sure that did it properly too. But now there's a vi there that I have
nothing to do with which shares no infrastructure with anything else, so I guess
that part's not my problem anymore.

But that's all utf8 and unicode stuff. I haven't got a clue what the strings it
includes MEAN.

> luckily, i've also been able to neuter Android's libc so none of this will
> affect Android whichever way toybox goes[1]. but i still think it's a bad idea.

I wouldn't have volunteered to do it myself, I'm being presented with complaints
and attempting to find the least bad way to resolve them. :)

"This is too many digits for humans to handle" is why adding commas to numbers
was invented. It was the obvious solution. And then somebody complained that
using commas is parochial, so I added the periods which should cover just well
over 90% of the planet's population. (China uses 1,000.0 about everybody.

If "consistently show megabytes for systems > X gigabytes" vs 'consitently show
kilobytes for systems < X gigabytes" is good enough, even when the resulting
numbers are long, I'm happy to rip the comma support back out.

> no "real people" should ever need to look at this, but machines and developers
> will, and every bit of localization hurts the real audience.

Yes and no. There's a lot of developers out there who don't speak english,
certainly not as their first language. I don't want to unnecessarily exclude them.

> at least 15'936.2 would be a valid C++14 identifier (and i'm assuming will make
> it into C2x) :-)

That's the opposite of helping.

> ___
> 1. strictly, the fact that you're doing your own insertion of ',' separators
> might hurt me (in the `top -b` case), but i'll worry about that if i notice it
> actually break any parsing. i know that's included in Android's standard
> bugreports, but i _don't_ know that anyone's parsing it.

If the units weren't constant before then their parsing was iffy at best. Now at
least the units should be constant on a given system.

Rob