[Toybox] strlower() bug
Rob Landley
rob at landley.net
Tue May 14 12:09:03 PDT 2024
On 5/14/24 12:12, enh wrote:
> On Tue, May 14, 2024 at 1:04 PM Rob Landley <rob at landley.net> wrote:
>>
>> On 5/14/24 07:10, enh wrote:
>> > macOS tests seem to be broken since this commit?
>> >
>> > FAIL: find strlower edge case
>> > echo -ne '' | touch aaaaaⱥⱥⱥⱥⱥⱥⱥⱥⱥ; find . -iname aaaaaȺȺȺȺȺȺȺȺȺ
>> > --- expected 2024-05-10 17:32:56.000000000 +0000
>> > +++ actual 2024-05-10 17:32:56.000000000 +0000
>> > @@ -1 +0,0 @@
>> > -./aaaaaⱥⱥⱥⱥⱥⱥⱥⱥⱥ
>>
>> Sigh. Apple's handling of utf8/unicode continues to be... "a challenge".
>>
>> When I run "make test_find" standalone, it gives me:
>>
>> scripts/runtest.sh: line 219: syntax error near unexpected token `;'
>> scripts/runtest.sh: line 219: ` R) LEN=0; B=1; ;&'
>>
>> Because bash 3.2 from 2007 doesn't understand ;&
>
> yeah, nor does mksh. it hasn't caused me any problems though; i've
> been ignoring it for years now.
>
>> And THEN it goes:
>>
>> touch: out of range or illegal time specification: YYYY-MM-DDThh:mm:SS[.frac][tz]
>> touch: out of range or illegal time specification: YYYY-MM-DDThh:mm:SS[.frac][tz]
>> FAIL: find newerat
>> echo -ne '' | find dir -type f -newerat @12345
>> --- expected 2024-05-14 11:16:40.000000000 -0500
>> +++ actual 2024-05-14 11:16:40.000000000 -0500
>> @@ -1 +0,0 @@
>> -dir/two
>>
>> Which is a different error that DOESN'T happen with the global tests, because
>> those are using toybox touch rather than homebrew's $TOUCH. But it works on
>> debian. Let's see:
>>
>> $ touch --version
>> touch: illegal option -- -
>> usage: touch [-A [-][[hh]mm]SS] [-achm] [-r file] [-t [[CC]YY]MMDDhhmm[.SS]]
>> [-d YYYY-MM-DDThh:mm:SS[.frac][tz]] file ...
>>
>> Thank you, gnu project. I'm gonna assume this is _also_ from 2007. (I made
>> scripts/prereq/build.sh for a REASON...)
>
> no, i think this is a BSD touch.
>
> yeah, that looks very like the FreeBSD touch's usage:
>
> static void
> usage(const char *myname)
> {
> fprintf(stderr, "usage: %s [-A [-][[hh]mm]SS] [-achm] [-r file] "
> "[-t [[CC]YY]MMDDhhmm[.SS]]\n"
> " [-d YYYY-MM-DDThh:mm:SS[.frac][tz]] "
> "file ...\n", myname);
> exit(1);
> }
>
>
>> Then when I run "make clean macos_defconfig tests" I get:
>>
>> Undefined symbols for architecture arm64:
>> "_iconv", referenced from:
>> _do_iconv in iconv.o
>> (maybe you meant: _iconv_main)
>> "_iconv_open", referenced from:
>> _iconv_main in iconv.o
>> ld: symbol(s) not found for architecture arm64
>>
>> Because the Makefile has:
>>
>> tests: ASAN=1
>> tests: toybox
>> scripts/test.sh
>>
>> And ASAN apparently breaks on homebrew's toolchain but not debian's toolchain.
>> Why does it break there but not on Linux...
>>
>> probe cc -Wall -Wundef -Werror=implicit-function-declaration
>> -Wno-char-subscripts -Wno-pointer-sign -funsigned-char
>> -Wno-deprecated-declarations -Wno-string-plus-int -Wno-invalid-source-encoding
>> -fsanitize=address -O1 -g -fno-omit-frame-pointer -fno-optimize-sibling-calls
>> -xc -o /dev/null -
>> error: cannot parse the debug map for '/dev/null': The file was not recognized
>> as a valid object file
>> clang: error: dsymutil command failed with exit code 1 (use -v to see invocation)
>>
>> Because it tries to read back the -o output we discarded, and fails when it
>> can't do so, thus all library probes fail and it tries to build with no
>> libraries. But only when ASAN is enabled, because ASAN uses -o as INPUT. Bravo.
>>
>> None of this is the actual unicode failure, this is just ambient macos...
FAIL: find strlower edge case
echo -ne '' | touch aaaaaⱥⱥⱥⱥⱥⱥⱥⱥⱥ; find . -iname aaaaaȺȺȺȺȺȺȺȺȺ
--- expected 2024-05-14 13:32:19.000000000 -0500
+++ actual 2024-05-14 13:32:19.000000000 -0500
@@ -1 +0,0 @@
-./aaaaaⱥⱥⱥⱥⱥⱥⱥⱥⱥ
make: *** [tests] Error 1
cfarm104 (homebrew):toybox landley$ ls generated/testdir/testdir/
aaaaa?????????
$ LC_ALL=en_US.UTF-8 ls generated/testdir/testdir
aaaaa?????????
$ generated/testdir/ls generated/testdir/testdir
aaaaa\342\261\245\342\261\245\342\261\245\342\261\245\342\261\245\342\261\245\342\261\245\342\261\245\342\261\245
$ echo -./aaaaaⱥⱥⱥⱥⱥⱥⱥⱥⱥ
-./aaaaaⱥⱥⱥⱥⱥⱥⱥⱥⱥ
$ generated/testdir/ls -N generated/testdir/testdir
aaaaaⱥⱥⱥⱥⱥⱥⱥⱥⱥ
cfarm104 (homebrew):toybox landley$ generated/testdir/ls -N
generated/testdir/testdir
aaaaaⱥⱥⱥⱥⱥⱥⱥⱥⱥ
cfarm104 (homebrew):toybox landley$ ls -N generated/testdir/testdir
ls: invalid option -- N
usage: ls [- at ABCFGHILOPRSTUWabcdefghiklmnopqrstuvwxy1%,] [--color=when] [-D
format] [file ...]
Why is toybox ls escaping by default here but not on Linux? Hmmm, it's gotta be
this call in crunch_qb():
// scrute the inscrutable, eff the ineffable, print the unprintable
else if ((len = wcrtomb(buf, wc, 0) ) == -1) len = 1;
Once again, I wist for stable/portable unicode functions in lib/unicode.c. I
know why I haven't GOT them (mostly), but this is just ridiculous. (They don't
have to be GREAT, but NOT THAT...)
(There's only 100k code points and MOSTLY I'm doing tests that return ONE BIT
answers. I'm aware it's a trap, but DUDE...)
Anyway, STILL not the actual issue at hand, the issue is that:
cfarm104 (homebrew):toybox landley$ generated/testdir/find
generated/testdir/testdir -iname aaaaaⱥⱥⱥⱥⱥⱥⱥⱥⱥ
generated/testdir/testdir/aaaaaⱥⱥⱥⱥⱥⱥⱥⱥⱥ
cfarm104 (homebrew):toybox landley$ generated/testdir/find
generated/testdir/testdir -iname aaaaaȺȺȺȺȺȺȺȺȺ
cfarm104 (homebrew):toybox landley$
The upper case string is not converting into the lower case string. Ok, let's
stick a +dprintf(2, "%d->%d\n", c, towlower(c)); into strlower() and it says
"570->58" which... is a colon? Hmmm, prepending LC_ALL=en_US.UTF-8 did not
change that.
It looks like macos towlower() refuses to return expanding unicode characters.
Possibly to avoid exactly the kind of bug this fixed, in exchange for corrupting
the data.
I don't know how to fix this other than stubbing out the test on macos, or
adding lib/unicode.c. (I _really_ want to find an 80/20 there. I'm aware I have
failed at least three previous attempts, and am 2/3 of the way to clearing off
my laptop so I can install the new OS version and put the big ram sticks back so
NOW IS NOT THE TIME, but still...)
Rob
More information about the Toybox
mailing list