[Toybox] Test suite gripe du jour.

Rob Landley rob at landley.net
Tue Sep 19 05:01:50 PDT 2023


Wow I have a lot of old reply windows buried under other windows...

On 8/7/23 11:06, enh wrote:
>> P.S. I didn't respond to Elliott's last email about testing because I didn't
>> know what to say. "I want absolutely everything because I have a dedicated staff
>> to weed out false positives" is not my use case.
> 
> unless you mean "you and me", there's no "staff" here :-)

Once bugs are found you theoretically have someone to hand them off to. I can
email kernel guys but they're not guaranteed to respond.

>> I want to know if I broke
>> toybox. In a lot of the github test failures, _toybox_ isn't what broke. I'm
>> aware that Posix and the Linux Test Project and so on aren't ideal, but I can't
>> do their jobs and mine.
> 
> that wasn't my point --- my point was "you'll be doing that
> regardless". you would (i assume) never include btrfs in your qemu
> setup, but you'll still get bug reports from folks using it.

True.

[And since that was written, we went down that rathole...]

> and
> you'll never be certain that your testing is thorough enough that you
> can just ignore bug reports as "can't be toybox; must be your
> kernel/fs/whatever", so you probably shouldn't put too much effort
> into qemu _in the hope of being free_, but that it's useful _anyway_
> in the same way a "works on my machine" datapoint is always useful.

The point of mkroot under qemu is to get _success_ coverage. I need environments
that can demonstrate that when the stars align, it does indeed work as designed,
which catches regressions (not just in toybox, could be kernel or libc or some
such too), and shows that with a big enough hammer it can be made to work with
glibc/musl/bionic, with gcc/llvm, on x86/arm/mips/s390x, 32 and 64 bit, big and
little endian, nommu, cares about alignment, and a certain amount on macos and
bsd and rumors of qnx.

"There is a way to get it to work" and "specific $USER could get it to work" are
different questions.

No test suite is going to prevent surprises coming in from the field. A test
suite can act as a test load to see how a given environment behaves differently
from other environments it's been built and run in, but any sufficiently large
use case does that, and Linux From Scratch or what adelie and alpine linux
provide to busybox are _better_ test loads for that because they're not
artificial. They are one definition of it working or not working for people
trying to do stuff with it.

>> Way back when toybox commit d6f8c41e2542 shrank Divya's
>> initial chmod.tests submission way down because the initial submission was
>> mostly testing the syscalls, not toybox. Which is nice but not what _this_ test
>> suite is trying to accomplish.
> 
> sure, but you'll never really escape that. i have an "ndk" bug right
> now where (apparently) readlinkat() sometimes returns a bad result, on
> some devices. but not reproducibly enough.

What filesystem was it where the link size returned by stat() was the COMPRESSED
size not the actual size readlink() would return, so the buffer was allocated
wrong and the read got truncated?

https://www.mail-archive.com/toybox@lists.landley.net/msg07206.html

And back in the busybox+uclibc days, <strike>murderfs</strike> reiserfs uniquely
didn't set the linux_dirent->d_type field (always DT_UNKNOWN) which broke
something...

As I said: I can't predict that sort of thing. Just gotta be ready for bug
reports when new users arrive. Tracking _down_ bug reports once you've got them
is... a thing I am sadly very experienced at. :(

> without getting to the
> bottom of that (and proving "bad kernel" or "bad security layer" or
> "bad vendor hack to libc" or whatever), it'll stay on the books as a
> possible bionic bug. (because it _could_ be, even if it's really hard
> to imagine how.)

Oh sure. But to a certain extent that's a resource allocation choice. We DID
eventually chase down and fix the btrfs loop issue, and I put an #ifdef block in
chrt.c #defining a bunch of syscall() wrappers because the musl-libc maintainer
has Opinions, and so on.

It was never "am I capable of tying this off", it was "do I want to to spend the
focus to go down this rathole until I'm out the other side"?

I've installed a lot of VM images over the years to reproduce an issue somebody
reported (who knows "pclinuxos" was a thing?), and had other people mail me
hardware, or gotten a remote login to their machine. When there's a budget I've
gone to their lab in person and worked out a reproduction sequence for "This
Thing That Sometimes Happens"... (Apparently this runs in the family, my
grandfather did it for <strike>the NSA</a> General Electric.)

More than one of my consulting gigs has been somebody who's worked with me
before calling me in as "that guy who root causes baffling intermittent
problems". Alas, the downside of many years as a hobbyist with nobody to hand
things off to and only intermittent ability to ask questions is you get very
good at "the bug stops here". (Never get good at doing something you don't want
to do. Although the trick is mostly spite: as long as you can stay angry enough
to focus and arrogant enough that the only way out is through...)

I entirely understand "I can't drop everything and go medieval on this bug's ass
for as long as it takes", and I also understand "I don't have the environment
this was seen in". But I pick at things I shouldn't, and my ADHD tends to treat
bugs as a crime scene where I CANNOT CONTINUE until I've got an isolated and
minimized reproduction sequence: freeze, cordon off the area, collect evidence,
theorize, get comfortable that this is parkable, THEN put it on the todo heap.
(Alas, it's IMMENSELY time consuming. And why I _cannot_ use windows. But I'm
very bad at NOT doing it in something I'm responsible for going forward.)

"I have never seen this happen, the balance of probability is your environment
is uniquely borked in a way I don't yet care about or maybe you used it wrong in
a way you're not including in your reporting context" is one thing. But "it
happened right in front of me" even once triggers totally different neuroses ("I
SAW that, not getting away with it...").

>> I need to get the linux from scratch build
>> reproduced under mkroot because that was my big real world dataset. Ideally I'd
>> then build either debootstrap or alpine's package repository under the result.
>> (Red Hat's gone full vogon, SuSE's business model is offering a second source to
>> Red Hat's customers, and Gentoo turned out to be nuts under the surface where
>> every ebuild file in the portage tree has a list of every architecture it's
>> allowed to build on so you can NEVER just build "for this architecture" and
>> adding a new architecture requires touching every file in the tree, and don't
>> get me started on the insane ebuild #include stack...) And then I'd love to get
>> AOSP working under that result because that's Elliott's big real world dataset.
> 
> eh, like i've said before --- AOSP is the one place you can be sure
> someone else will be testing. (though not in CTS, which does mean
> there's potential for vendor breakage, including them deciding to ship
> btrfs :-) )

Which is fixed upstream now.

(Once a second person reported the same issue, it went from "your environment is
uniquely borked" to "There Has Been A Recurrence". This is why I only threw ONE
bowl of liquid nitrogen into the swimming pool at Penguicon 4, ala
https://www.youtube.com/watch?v=w2mj-Sq2oeo because if I'd done it a second time
hotel management might have felt compelled to act.)

Rob


More information about the Toybox mailing list