[Toybox] Test suite gripe du jour.

Sat Aug 5 14:52:01 PDT 2023

So the failing MacOS test was:

FAIL: tail -F
echo -ne '' | tail -s .1 -F walrus 2>/dev/null & sleep .2; echo hello > walrus;
sleep .2; truncate -s 0 walrus; sleep .2; echo potato >> walrus; sleep .2;
echo hello >> walrus; sleep .2; rm walrus; sleep .2; echo done > walrus;
  sleep .5; kill %1

--- expected	2023-08-01 04:09:53.000000000 +0000
+++ actual	2023-08-01 04:09:55.000000000 +0000
@@ -1,4 +1,3 @@
-hello
 potato
 hello
 done

Which means we ran this background process:

  tail -s .1 -F walrus

And then ran:

  sleep .2; echo hello > walrus; sleep .2; truncate -s0 walrus

Meaning during the 2/10 of a second sleep between the echo and the truncate,
tail's 1/10 of a second of sleep did not finish and resume running.

Tenth of a second sleeps should be an ENORMOUS amount of time for modern
hardware, where "modern" includes the first generation of raspberry pi going for
$35 in 2012 with the filesystem on an sd card. I need to keep the sleeps short
because a lot of tests use them and they add up.

Unfortunately, if you run such tests on hardware that's outright thrashing its
resources introducing funky latency spikes, then in theory even a 2 second sleep
isn't necessarily long enough. (Thunderbird has gone to lunch for 8 seconds at a
time even with an SSD, when that and chrome fight to see which can bloat larger,
it gets ugly.)

Sigh. Maybe on MacOS I should run failing tests a second time to see if it fails
again? That does not seem right. I could also have every sleep be a full second
on MacOS, but I'm not convinced that's long enough.

Rob

P.S. I didn't respond to Elliott's last email about testing because I didn't
know what to say. "I want absolutely everything because I have a dedicated staff
to weed out false positives" is not my use case. I want to know if I broke
toybox. In a lot of the github test failures, _toybox_ isn't what broke. I'm
aware that Posix and the Linux Test Project and so on aren't ideal, but I can't
do their jobs and mine. Way back when toybox commit d6f8c41e2542 shrank Divya's
initial chmod.tests submission way down because the initial submission was
mostly testing the syscalls, not toybox. Which is nice but not what _this_ test
suite is trying to accomplish. I need to get the linux from scratch build
reproduced under mkroot because that was my big real world dataset. Ideally I'd
then build either debootstrap or alpine's package repository under the result.
(Red Hat's gone full vogon, SuSE's business model is offering a second source to
Red Hat's customers, and Gentoo turned out to be nuts under the surface where
every ebuild file in the portage tree has a list of every architecture it's
allowed to build on so you can NEVER just build "for this architecture" and
adding a new architecture requires touching every file in the tree, and don't
get me started on the insane ebuild #include stack...) And then I'd love to get
AOSP working under that result because that's Elliott's big real world dataset.
But I really really really need to disentangle AOSP into layers to make that
tractable, and am not even _looking_ at that can of worms yet.