[Toybox] strlower() bug

enh enh at google.com
Tue May 14 05:10:58 PDT 2024


macOS tests seem to be broken since this commit?

FAIL: find strlower edge case
echo -ne '' | touch aaaaaⱥⱥⱥⱥⱥⱥⱥⱥⱥ; find . -iname aaaaaȺȺȺȺȺȺȺȺȺ
--- expected 2024-05-10 17:32:56.000000000 +0000
+++ actual 2024-05-10 17:32:56.000000000 +0000
@@ -1 +0,0 @@
-./aaaaaⱥⱥⱥⱥⱥⱥⱥⱥⱥ

On Wed, May 8, 2024 at 9:38 AM Rob Landley <rob at landley.net> wrote:
>
> On 5/6/24 17:12, Ray Gardner wrote:
> > While working on an awk implementation for toybox, I found a bug in
> > strlower(), which is used only in find.c. I've attached some tests to
> > put in find.test to reveal it. I can't put them here directly because
> > I don't think the UTF-8 names will come through. (I modelled my awk
> > tolower()/toupper() code on your strlower().)
>
> Your test doesn't create the files you're finding, so find is supposed to fail?
> Your first test doesn't barf under ASAN, and then the second one's going to fail
> because echo -n | wc says it's 258 bytes and the VFS file length limit is 255
> bytes, so there CAN'T be a file named that on Linux. (Path length != path
> component length, there's no slashes in there.)
>
> > The problem is in the test if the output string needs to be enlarged
> > to take an expanded lowercase:
> >     // Case conversion can expand utf8 representation, but with extra mlen
> >     // space above we should basically never need to realloc
> >     if (mlen+4 > (len = new-try)) continue;
> >
> > The mlen+4 needs to be mlen-4 to leave at least 4 bytes for the next character.
>
> Hmmm, possibly. I still don't understand what your test case is testing. (Just
> trying to trigger an ASAN violation with an otherwise nonsense test?)
>
> > As the comment indicates, it should "never" need to realloc;
>
> No, the first comment is "never" because triggering probably indicates a libc
> bug (we converted it from valid utf8 to a unicode code point, ran it through
> libc's towlower(), and are now trying to convert the result _back_ to utf8, an
> encoding hiccup at this point seems unlikely? But I don't trust locale plumbing
> ever, so...)
>
> The second is "basically never" because it requires an insane input string, but
> that's user controlled and users do crazy things, sometimes even intentionally.
>
> > it takes
> > a very long name of uppercase characters that do expand when made
> > lowercase. But the code is there to handle that very case.
>
> The first malloc rounds the allocation up to next 8 byte boundary _after_ what
> it's actually using, so 9-16 bytes of zeroes at the end, and assuming the
> conversion only ever grows 1 byte (I don't remember the pathological expansion
> case, it's in my blog somewhere, but your test is turning c8 ba into e2 b1 a5
> which is 1 byte of expansion) then you need at least 8 expanding unicode code
> points to burn through the padding, so your first test string is too short to
> trigger a problem. And your second is too long to produce a valid filename, so
> the test can't _succeed_...
>
> Sigh, lemme come up with a test that demonstrates the fix working... the minimal
> one seems to be ./find . -iname aaaaaȺȺȺȺȺȺȺȺȺ
>
> And then, of course, TEST_HOST fails because I need to enable a utf8 locale, but
> I made plumbing for that recent-ish-ly...
>
> commit 6800a95ef328
>
> > BTW, when I run those tests, they "PASS", but show as aborted:
> > corrupted size vs. prev_size
> > scripts/runtest.sh: line 137: 265983 Aborted                 find .
> > -iname AȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺC
> > PASS: find utf8 uppercase long name
>
> Odd.
>
> > The test echos and checks the $? return code and the abort apparently
> > leaves that as 0.
>
> That could be anything from a bash issue to your distro's libc. The only trap in
> tests.sh is for SIGINT, and that handler isn't inherited by child processes. The
> return code of a process killed by a signal should be 128+signum, which the test
> plumbing would notice if it was the actual exit code of your shell snippet.
>
> I checked in a test that should actually succeed, but would fail with ASAN
> enabled before the bug was fixed.
>
> > Is there a way to fix the test system so it can
> > force the exit code to be something else?
>
> Not if the signal/exit isn't allowed to propagate back to it by the test. You
> ran a child process and then unconditionally did an ;echo $? meaning test.sh
> doesn't get notified of the child process getting killed by a signal, it
> unconditionally (because ;) went on to run a second command, "echo" which is
> returning whatever your bash recorded.
>
> Some distros have horrible fault interceptors that log crap into syslog or dmesg
> or some such, AND THEN RETURN SUCCESS. (Which is doubly insane: A) a program
> faulting does not need to be globally logged on a development system, B)
> returning success when that happens is very sad, but their "logic" was that some
> scripts would otherwise misbehave.)
>
> > When I run the test from a
> > command line directly in bash, it gets a code of 134 (SIGABRT).
>
> Without ASAN I'm getting 139 (128+11 = SIGSEGV). There would appear to be a
> difference in our environments.
>
> Rob
> _______________________________________________
> Toybox mailing list
> Toybox at lists.landley.net
> http://lists.landley.net/listinfo.cgi/toybox-landley.net


More information about the Toybox mailing list