[Toybox] toybox ebuild.

enh enh at google.com
Mon Feb 13 09:09:30 PST 2023


On Mon, Feb 13, 2023 at 1:07 AM Rob Landley <rob at landley.net> wrote:
>
>
>
> On 2/9/23 19:25, enh wrote:
> > On Thu, Feb 9, 2023 at 5:08 PM Rob Landley <rob at landley.net> wrote:
> >>
> >> On 2/9/23 07:01, Rob Landley wrote:
> >> > On 2/9/23 03:51, Patrick Lauer wrote:
> >> >> On 2/5/23 12:59, Rob Landley wrote:
> >> >>> Doing my irregular trawl to see if distro repos have any interesting patches or
> >> >>> bug reports that haven't made it upstream, and... at the risk of opening a can
> >> >>> of worms:
> >> >>>
> >> >>> https://gitweb.gentoo.org/repo/gentoo.git/tree/sys-apps/toybox/toybox-0.8.8.ebuild#n52
> >> >>>
> >> >>> You probably want "make tests" (plural), because "make test" builds the "test"
> >> >>> command as a standalone executable. (Which should usually succeed?)
> >> >>
> >> >> Aye. That makes sense. Fixed.
> >> >>
> >> >> Now I'm reliably running into a test failure:
> >> >>
> >> >> FAIL: cut -C test1.txt
> >> >> echo -ne '' |
> >> >> "/var/tmp/portage/sys-apps/toybox-0.8.9/work/toybox-0.8.9/generated/testdir/cut"
> >> >> -C -1 "$FILES/utf8/test1.txt"
> >> >> --- expected 2023-02-09 09:49:21.525159648 -0000
> >> >> +++ actual   2023-02-09 09:49:21.525159648 -0000
> >> >> @@ -1 +1 @@
> >> >> -l̴̗̠
> >> >> +l
> >> >> make: *** [Makefile:77: tests] Error 1
> >> >>
> >> >> No idea yet what's triggering it, maybe you have some insight.
> >> >
> >> > Sigh, I hit something similar on bionic with the NDK build (because even a
> >> > static build of bionic wanted to read files out of /System in order to tell me
> >> > what is and isn't a combining character):
> >> >
> >> > http://lists.landley.net/pipermail/toybox-landley.net/2021-October/028766.html
> >> >
> >> > I do my own utf8 parsing, but _unicode_ is a bear to do yourself (just answering
> >> > the question "is this a combining character" involves
> >> > http://lists.landley.net/pipermail/toybox-landley.net/2021-October/028753.html
> >> > and
> >> > http://lists.landley.net/pipermail/toybox-landley.net/2021-October/028758.html
> >> > and I decided it was just all out of scope), but the dance to get glibc to admit
> >> > unicode exists is nontrivial. (And if the state isn't set, ze functions: zey do
> >> > nothink.)
> >> >
> >> > Lemme see what I can do with livegui-amd64 under qemu to reproduce this here...
> >>
> >> Reproduced. Haven't really root caused, but I was reminded of:
> >>
> >>   https://github.com/landley/toybox/issues/300
> >>
> >> Which boils down to "the locale we're trying to use is not installed".
> >>
> >> Toybox is doing:
> >>
> >>     setlocale(LC_CTYPE, "");
> >>     if (strcmp("UTF-8", nl_langinfo(CODESET)))
> >>       uselocale(newlocale(LC_CTYPE_MASK, "en_US.UTF-8", NULL));
> >>
> >> And it looks like gentoo has "C.utf8" instead (no dash), which... yeah, it works
> >> if I tell it to uselocale() that instead. I probably need multiple fallbacks in
> >> a loop. (Does it care about the dash? Is it case sensitive? How many iterations
> >> here...)
>
> Ok, "man 7 locale" says that C.UTF-8 should fall back to loading C.utf8...
>
> >> Oh goddess why is it doing uselocale(newlocale()), I think it was a macos thing?
> >> Yeah, git annotate says commit 4786fd610 which was Elliott. (Do you remember why
> >> it was doing that?)
> >
> > because there isn't a C.UTF-8 (no matter how you try to spell it!) on
> > macOS, so we need to "merge" utf-8-ness into the current locale. (i'd
>
> That isn't what  the  man page for newlocale() says we're doing, though?
> newlocale(FLAG, "NAME", 0) is creating a new locale that's a subset of the
> "NAME" locale, and the 0 means locale elements we don't give a flag for are
> copied from the "POSIX" locale. (Which should be a synonym for "C".)

i don't think so? that's not how i read
https://pubs.opengroup.org/onlinepubs/9699919799/functions/newlocale.html
anyway. i think it comes down to how you interpret "default locale"? i
read it as equivalent to "", but you think it means "POSIX". i think
POSIX agrees with me though? (search for "default locale" on
https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap07.html
for their definition. this is why we call setlocale() in that code,
fwiw.)

> So if the "" locale was french or something, this will switch it to "C". (Did I
> mention I'm not a fan of the locale plumbing's design?)
>
> I _want_ to say we don't use any elements of locale other than the utf8
> character stuff (I hope WHICH locale it is doesn't affect toupper() and
> tolower() but microsoft WAS on the unicode committee...)... but date advertises
> %x (which is handled for us by libc).
>
> I honestly don't remember what issue 52422388520e was fixing but I'm not gonna
> argue locale issues with a guy with three umlauts in his name. But looking back
> at it, commit 67ddade3373d replaced all uses of mbrtowc() out of libc with
> utf8towc() out of lib precisely so we WOULDN'T have to care about locale, so why
> are we still using libc's wcrtomb()? (Looking at the man page... what on earth
> is "shift state"?)
>
> Sigh, the real problem is towupper() and towlower() which if I recall did not
> work if a locale wasn't loaded first. I still kind of want to do an mbtoutf8()
> to make it symmetrical. (I THINK I did this before, but alas Google search
> continues to deteriorate: even though I got dreamhost to fix the robots.txt file
> on lists.landley.net back on Jan 23 Google STILL has the old one cached 3 weeks
> later.)
>
> Right, fix the thing in front of us for now...
>
> > argue _that's_ not the ugly part --- the ugly part is that we merge
> > "en_US.UTF-8" in. but i thought i'd wait until someone was actually
> > hurt by it before trying to construct the exact right locale for
> > them.)
>
> The main page says we're merging it into the posix locale, not the current locale.
>
> Is it too late to tell the gentoo guys to go back to running "make test"?
>
> Rob


More information about the Toybox mailing list