[Toybox] toybox ebuild.

Mon Feb 13 01:20:57 PST 2023

On 2/9/23 19:25, enh wrote:
> On Thu, Feb 9, 2023 at 5:08 PM Rob Landley <rob at landley.net> wrote:
>>
>> On 2/9/23 07:01, Rob Landley wrote:
>> > On 2/9/23 03:51, Patrick Lauer wrote:
>> >> On 2/5/23 12:59, Rob Landley wrote:
>> >>> Doing my irregular trawl to see if distro repos have any interesting patches or
>> >>> bug reports that haven't made it upstream, and... at the risk of opening a can
>> >>> of worms:
>> >>>
>> >>> https://gitweb.gentoo.org/repo/gentoo.git/tree/sys-apps/toybox/toybox-0.8.8.ebuild#n52
>> >>>
>> >>> You probably want "make tests" (plural), because "make test" builds the "test"
>> >>> command as a standalone executable. (Which should usually succeed?)
>> >>
>> >> Aye. That makes sense. Fixed.
>> >>
>> >> Now I'm reliably running into a test failure:
>> >>
>> >> FAIL: cut -C test1.txt
>> >> echo -ne '' |
>> >> "/var/tmp/portage/sys-apps/toybox-0.8.9/work/toybox-0.8.9/generated/testdir/cut"
>> >> -C -1 "$FILES/utf8/test1.txt"
>> >> --- expected 2023-02-09 09:49:21.525159648 -0000
>> >> +++ actual   2023-02-09 09:49:21.525159648 -0000
>> >> @@ -1 +1 @@
>> >> -l̴̗̠
>> >> +l
>> >> make: *** [Makefile:77: tests] Error 1
>> >>
>> >> No idea yet what's triggering it, maybe you have some insight.
>> >
>> > Sigh, I hit something similar on bionic with the NDK build (because even a
>> > static build of bionic wanted to read files out of /System in order to tell me
>> > what is and isn't a combining character):
>> >
>> > http://lists.landley.net/pipermail/toybox-landley.net/2021-October/028766.html
>> >
>> > I do my own utf8 parsing, but _unicode_ is a bear to do yourself (just answering
>> > the question "is this a combining character" involves
>> > http://lists.landley.net/pipermail/toybox-landley.net/2021-October/028753.html
>> > and
>> > http://lists.landley.net/pipermail/toybox-landley.net/2021-October/028758.html
>> > and I decided it was just all out of scope), but the dance to get glibc to admit
>> > unicode exists is nontrivial. (And if the state isn't set, ze functions: zey do
>> > nothink.)
>> >
>> > Lemme see what I can do with livegui-amd64 under qemu to reproduce this here...
>>
>> Reproduced. Haven't really root caused, but I was reminded of:
>>
>>   https://github.com/landley/toybox/issues/300
>>
>> Which boils down to "the locale we're trying to use is not installed".
>>
>> Toybox is doing:
>>
>>     setlocale(LC_CTYPE, "");
>>     if (strcmp("UTF-8", nl_langinfo(CODESET)))
>>       uselocale(newlocale(LC_CTYPE_MASK, "en_US.UTF-8", NULL));
>>
>> And it looks like gentoo has "C.utf8" instead (no dash), which... yeah, it works
>> if I tell it to uselocale() that instead. I probably need multiple fallbacks in
>> a loop. (Does it care about the dash? Is it case sensitive? How many iterations
>> here...)

Ok, "man 7 locale" says that C.UTF-8 should fall back to loading C.utf8...

>> Oh goddess why is it doing uselocale(newlocale()), I think it was a macos thing?
>> Yeah, git annotate says commit 4786fd610 which was Elliott. (Do you remember why
>> it was doing that?)
> 
> because there isn't a C.UTF-8 (no matter how you try to spell it!) on
> macOS, so we need to "merge" utf-8-ness into the current locale. (i'd

That isn't what  the  man page for newlocale() says we're doing, though?
newlocale(FLAG, "NAME", 0) is creating a new locale that's a subset of the
"NAME" locale, and the 0 means locale elements we don't give a flag for are
copied from the "POSIX" locale. (Which should be a synonym for "C".)

So if the "" locale was french or something, this will switch it to "C". (Did I
mention I'm not a fan of the locale plumbing's design?)

I _want_ to say we don't use any elements of locale other than the utf8
character stuff (I hope WHICH locale it is doesn't affect toupper() and
tolower() but microsoft WAS on the unicode committee...)... but date advertises
%x (which is handled for us by libc).

I honestly don't remember what issue 52422388520e was fixing but I'm not gonna
argue locale issues with a guy with three umlauts in his name. But looking back
at it, commit 67ddade3373d replaced all uses of mbrtowc() out of libc with
utf8towc() out of lib precisely so we WOULDN'T have to care about locale, so why
are we still using libc's wcrtomb()? (Looking at the man page... what on earth
is "shift state"?)

Sigh, the real problem is towupper() and towlower() which if I recall did not
work if a locale wasn't loaded first. I still kind of want to do an mbtoutf8()
to make it symmetrical. (I THINK I did this before, but alas Google search
continues to deteriorate: even though I got dreamhost to fix the robots.txt file
on lists.landley.net back on Jan 23 Google STILL has the old one cached 3 weeks
later.)

Right, fix the thing in front of us for now...

> argue _that's_ not the ugly part --- the ugly part is that we merge
> "en_US.UTF-8" in. but i thought i'd wait until someone was actually
> hurt by it before trying to construct the exact right locale for
> them.)

The main page says we're merging it into the posix locale, not the current locale.

Is it too late to tell the gentoo guys to go back to running "make test"?

Rob