[Toybox] Release 0.8.10

Thu Aug 3 12:43:40 PDT 2023

On Tue, Aug 1, 2023 at 9:02 PM Rob Landley <rob at landley.net> wrote:
>
> On 8/1/23 19:56, enh wrote:
> > On Mon, Jul 31, 2023 at 6:52 PM Rob Landley <rob at landley.net> wrote:
> >>
> >> On 7/31/23 09:31, enh wrote:
> >> > seems like this release is in a pretty bad state?
> >>
> >> The tests passed locally! All of 'em! With glibc and musl! Sigh...
> >>
> >> > github CI is failing for both
> >> > linux and macOS... linux seems to have some tar failures
> >>
> >> Yeah, it's those darn sparse failures again. On ext4 writing a sparse file
> >> behaves determinstically-ish, but butterfly-effect-fs not so much.
> >
> > yeah, that's one thing that's really weird --- sometimes the tests
> > pass in github's CI anyway.
>
> Microsoft Github.
>
> >> Admittedly it's only user visible if you _ask_ for it, and I'm kind of tempted
> >> to teach tar that "--sparse means any sufficient run of zeroes becomes #*%(#&
> >> sparse whether or not the filesystem knows about it". Where "sufficient" would
> >> logically be 512 byte aligned 512 byte blocks, because that's how tar thinks.
> >> (It's a space savings. I don't THINK there's a maximum number of sparse extents?
> >> I've even got a realloc every 512 entries in the existing loop! And a note to
> >> figure out how to test that properly. Don't ask me what gnu/dammit does with a
> >> multiblock sparse table, it _probably_ works?)
> >>
> >> *shrug* If nothing else it would eliminate the filesystem dependency...
>
> Sigh, the gnu/dammit tar has --hole-detection=seek/raw and of course the man
> page does not explain what they DO, but I'm assuming "raw" makes it sparse
> whenever the data is all zeroes and seek detects the existing sparseness?
>
> I'm leaning towards just having --sparse make it be sparse whenever it can be,
> especially if filesystems you extract it into DON'T RETAIN THE INFO.
>
> I don't THINK being more aggressive about sparsifying files when given --sparse
> should breaking things? (Modulo loopback mounting filesystem images or swapon
> files?)
>
> The problem here is:
>
> 1) The toybox code is currently doing this right.
>
> 2) The build/run environment doesn't allow it to work right.
>
> Is the point of the test to find environment problems, or to find toybox
> regressions? Should the tests have an --aggressive flag of some kind? I'm
> already planning a "run as root under a special 'mkroot.sh tests' image that has
> known stuff in places...

(personally, i've always felt "all of the above"... insufficiency
testing has been a problem since Wirth and Dijkstra were arguing about
it in CACM, before the 6502 we'd grow up on had been designed, and
it's not going to fix itself any time soon. as far as i'm concerned,
although it's _convenient_ for me if the kernel tests catch kernel
bugs, and the libc tests catch libc bugs, and the toybox tests catch
toybox bugs --- with each layer being able to rely on the previous
layer's testing ... i'm pragmatic enough to just be happy that someone
caught something! well, after i stop complaining about how execrable
https://github.com/linux-test-project/ltp is or about how much time
i've wasted trying to debug a "toybox bug" that was actually nothing
to do with toybox, anyway :-) this is also why -- although i _want_
unit tests -- i'll _take_ integration tests, and i really want _both_
anyway.)

> This is a design issue. There isn't a right answer, it's a question of what we
> want to test.
>
> >> > FAIL: tar sparse without overflow
> >> > echo -ne '' | tar c --owner root --group sys --mtime @1234567890 --sparse fweep
> >> > | SUM 3
> >> > --- expected 2023-07-29 01:27:20.471064281 +0000
> >> > +++ actual 2023-07-29 01:27:20.475064343 +0000
> >> > @@ -1 +1 @@
> >> > -50dc56c3c7eed163f0f37c0cfc2562852a612ad0
> >> > +4b0cf135987e8330d8a62433471faddccfabac75
> >>
> >> In order for this to be happening the sparse test I added at the start has to
> >> pass, but then the larger saving-of-sparseness does not match the file we just
> >> created on the previous line.
> >>
> >> I.E. the microsoft github behavior has to be INCONSISTENT within the same run to
> >> trigger this. Wheee...
>
> Although I may be premature blaming btrfs because Microsoft Github is probably
> migrating infrastructure over to Windows the way they did with hotmail (it's
> like Sun migrating Looking Glass to the Solaris kernel, or an alcoholic taking a
> drink, they can't _NOT_ do it even knowing the consequences), so this "ubuntu"
> container could actually be Windows Subsystem for Linux or using a samba mount
> as its filesystem or some such. (I can't ssh into it to poke around, so...)
>
> But I'm still not convinced btrfs is ready for primetime after the whole
> "getdents() is never guaranteed to terminate" thing. (How is that NOT a denial
> of service attack waiting to happen?)

(it was already not on my list of things to ever try, but thanks to
that bug -- and the apparent lack of interest in it -- it's now on my
list of things never to try.)

> Sigh, I want the commands to be portable but there's only so much I can _test_
> with "same syscall returns different results". (And this isn't even the
> TEST_HOST=1 version skew can of worms...)
>
> >> Which works on both glibc and musl, with ASAN on the glibc build and when I
> >> enable ASAN on the musl build the cross compiler goes "x86_64-linux-musl-cc:
> >> fatal error: cannot read spec file 'libsanitizer.spec': No such file or
> >> directory" so that's nice...
> >>
> >> > linux also dies in the sed timeout test; that seems to be a pathological case
> >> > for asan because increasing the timeout to 60s also didn't pass. (though
> >> > weirdly, that test is fine -- finishing almost instantly, just like non-asan
> >> > -- on macOS.
> >>
> >> Didn't see it on debian's gcc+glibc ASAN, but mostly likely that has fewer checks.
> >
> > (to be fair, i actually have no idea of the state of the gcc asan; but
> > all the people _i_ know who work on asan-type stuff for a living work
> > on the llvm one.)
>
> I've more or less integrated testing with ASAN into my workflow now, but "not
> ASAN enough" is likely to take a little longer...
>
> >> > not sure whether that's a bsd/glibc difference or a linux-only asan
> >> > bug. the latter seems less likely, but i'll mention it to the asan folks anyway...)
> >>
> >> I remind you of:
> >>
> >> commit c0dca293c1301a6315684703706597db07a8dbe1
> >> Author: Rob Landley <rob at landley.net>
> >> Date:   Sat Jun 27 03:14:49 2020 -0500
> >>
> >>     The bionic/clang asan plumbing slows the test down >10x, so expand timeout.
> >>
> >> That test is ping-ponging between a bunch of different segments (the source
> >> buffer, the destination buffer, the parsed regex struct, and the stack, global
> >> variables, the toybox text segment, and glibc's library text segment) and it's
> >> entirely possible whatever virtual TLB setup ASAN does to catch weirdness is
> >> getting thrashed. Worse now than when the 20 second timeout was enough...
> >
> > /me wonders if the reason i think this is fine "on macOS" is because i
> > actually mean "on an M1 because it has truly insane memory bandwidth
> > [at the cost of non-upgradeable memory, of course]".
>
> Or it has a bigger TLB or different cache eviction strategy? Something ASAN is
> doing is making memory access pathological, but it's probably just be triggering
> it. If Microsoft Github _is_ using WSL or WSL2 under those ubuntu images, then
> the windows kernel is just about guaranteed to be doing something really stupid:
> it's windows. (And that whole "Azure" cloud nonsense is running its vm with
> Windows under the covers at the best of times.)
>
> All I know is my 10 year old laptop _without_ ASAN takes 1/4 of a second to run
> the test. It's got a Core i5 from 2013 and memory from a store called "Discount
> Electronics". I suspect my Pixel 3a is slightly faster than this laptop.

(i've passed my various asan bugs/infelicities on to our asan folks,
who happen to also know apple's asan folks. everyone gets a bug!)

> > i'd report the results of running the asan tests on an x86-64 mac here
> > ... except that it crashes immediately the first time it starts
> > toybox, somewhere deep in libclang_rt.asan_osx_dynamic.dylib (their
> > fault, not yours).
>
> I plead the third.
>
> > but, yeah, my M1 mac is passing everything quickly right now.
> >
> >> Meanwhile, without ASAN wrapping date +%s.%N around the test says it takes a
> >> quarter of a second on my 10 year old laptop:
> >>
> >> 1690850299.108588387
> >> PASS: sed megabyte s/x/y/g (20 sec timeout)
> >> 1690850299.342395058
> >>
> >> A reasonable chunk of which is the shell test plumbing. (Just two consecutive
> >> "date +%s.%N; date+%s.%N" calls from the shell are .007 seconds apart on this
> >> machine, nontrivial chunk of that 250 milliseconds. I think the original 10
> >> second timeout was to make it reliably pass on my 66mhz Turtle board.)
> >>
> >> I can't think of a fix here other than disabling the test...
> >
> > yeah, or skipping if $ASAN is set? :-(
> >
> > for now, though, Android's CI doesn't care as long as *hwasan* is fast
> > enough, and a quick test on an aosp_cheetah_hwasan-userdebug device
> > says ... "can't create /expected: Read-only file system". oh. hmm.
> > looks like https://github.com/landley/toybox/commit/03e1cc1e45b67ad65e5ad0ae47b7a54e68d929d5
> > broke things. not sure why $TESTDIR isn't set for me? oh, because
> > that's set by scripts/test.sh which we don't use --- we call
> > scripts/runtest.sh directly.
>
> Sorry, I should have emailed you specifically about that one...

fixed, merged yesterday, and not reverted yet. must be good! :-)

(i should probably kick off a host prebuilt update too, but part of me
is thinking "let's see if it makes it until next week without being
reverted on the device first", since i was actually a few weeks behind
ToT. but if i wait, i'll likely forget, so... yeah, i'll look at that
this afternoon.)

> > too late for that to be today's problem though... i'll look further tomorrow!
> >
> > ah, fuck it, i'll only spend the evening wondering...
> >
> > yes, with the obvious line added to run-tests-on-android.sh, all the
> > tests pass on my hwasan build (and the sed test only takes a couple of
> > seconds). (for reference, my linux/x86-64 hardware that timed out was
> > a work amd threadripper box, not my personal 10 year old laptop!)
>
> Hmmm... The test went in because a change went in because a build script was
> very slow. If somebody does a build with an ASAN toybox, the slow comes back.
>
> We _can_ remove the test, but I don't know if that's the right call? The test is
> sort of doing its job? It didn't exactly find an issue with toybox, but it found
> an issue that _hits_ toybox...
>
> I'm open to suggestions.

bury head in sand for now? because this is _passing_ on github ci;
it's only when i run it on my otherwise ridiculously fast threadripper
locally we're seeing a timeout?! let me try to see if that's OS or
hardware by trying it on a thinkpad running Google's debian setup.
(reminder to self: _this_ is why you have so many machines!) yeah,
unsurprisingly that timed out on my laptop too. but `export
CC=clang-13` (as opposed to the default clang-14) fixes it. okay, let
me take _that_ to the asan folks... (even though they'll probably say
"14?! dude, why are you using a clang from more than a year ago?!".
actually, i can repro on clang-15 too, but debian testing doesn't have
clang-16 :-( )

> Rob