[Toybox] FYI musl's support horizon.

enh enh at google.com
Fri Sep 3 16:22:42 PDT 2021


On Fri, Aug 27, 2021 at 6:20 AM Rob Landley <rob at landley.net> wrote:

> On 8/26/21 5:56 PM, enh wrote:
> >     I keep telling people I could spend a focused year on JUST the test
> suite and
> >     they don't believe me. When people talk about function testing vs
> regression
> >     testing vs coverage testing I get confused because it's all the same
> thing?
> >
> >
> > i'll include the main failure modes of each, to preempt any "yes, but"s
> by
> > admitting that _of course_ you can write fortran in any language, but
> the idea
> > is something like:
> >
> > integration testing - answers "does my product work for real use
> cases?". you
> > definitely want this, for obvious reasons, and since your existing
> testing is
> > integration tests, i'll say no more. other than that the failure mode
> here is
> > relying only on integration tests and spending a lot more time/effort
> debugging
> > failures than you would if you could have caught the same issue with a
> unit test.
>
> I'm relying on the fact I wrote almost all the code myself, and thoroughly
> reviewed the rest, to be able to mentally model what everything is doing.
>
> That said, I'm trying to get the bus number up so you don't NEED me to do
> this
> sort of thing...
>

exactly.

(though to be honest, i've found it useful for myself when everything's
swapped out after a year or two. someone asked me about some code i'd
written recently, and i told them they were wrong, so-and-so had written
it, and they pulled out `git log` as proof.)


> > unit testing - reduces the amount of digging you have to do _when_ your
> > integration tests fail. (also makes it easier to asan/tsan or whatever,
> though
> > this is much more of a problem on large systems than it is for something
> like
> > toybox, where everything's small and fast anyway, versus "30mins in to
> > transcoding this video, we crash" kinds of problem.) for something like
> toybox
> > you'd probably be more interested in the ability to mock stuff out ---
> your "one
> > day i'll have qemu with a known set of processes" idea,
>
> It's kinda hard to test things like ps/ifconfig/insmod outside of a
> carefully
> controlled known environment.
>

that's what https://devopedia.org/mock-testing is for. (lets you easily
test extreme and/or "shouldn't happen" values too.)


> > but done by swapping
> > function pointers. one nice thing about unit tests is that they're very
> easily
> > parallelized. on a Xeon desktop i can run all several thousand bionic
> unit tests
> > in less than 2s... whereas obviously "boot a device" (more on the
> integration
> > test side) takes a lot longer. the main failure mode here (after
> "writing good
> > tests is at least as hard as writing good code", which i'm pretty sure
> you
> > already agree with, and might even be one of your _objections_ to unit
> tests),
>
> Eh, the toys/example/demo_$THINGY commands are sort of intended to do this
> kind
> of thing for chunks of shared infrastructure (library code, etc).
>
> My objection here is really granularity, if you test at TOO detailed a
> level
> you're just saying "this code can't change". I recently changed xabspath()
> to
> have a flag based interface, changed its existing users in the commands,
> and
> have the start of a toys/example/demo_abspath.c (which I mentioned in my
> blog I
> was too exhausted to properly finish at the time). Granular tests directly
> calling the functions would have been invalidated by the change, meaning
> I'd
> either have deleted them or rewritten them.
>
> With other people's test suites I often encounter test failures that don't
> MEAN
> anything. Some test is failing because the semantics of something somewhere
> changed, and none of the users care, and the test suite accumulates "known
> failures" like code emitting known warnings.
>
> A libc has an API with a lot of stable documented entry points. Toybox's
> entry
> points are almost entirely command line utilities with a shared entry
> codepath
> (including option parsing) and a shared library of common functions.


well, this is the problem (and for mocking especially): for unit testing to
work any better than integration testing, you have to design for testing.
for example: if your "ps" needs to be able to take real data from the
system or synthetic data from a mock, you need to have split the code that
way. and that's a lot easier if you start with that in mind rather than try
to retrofit it.


> I don't
> want to test the lib/*.c code directly from something that ISN'T a command
> (some
> other main() in its own .c function, possibly accessing it via dlopen() or
> something) because the top level main.c initializes toy_list[] and has
> toy_init() and toy_find() and toy_exec() and so on. If I factored that out
> I'd
> _only_ be doing so for the test suite, not because it made design sense. I
> don't
> want to duplicate plumbing and test in a different environment than I'm
> running in.
>
> The mkroot images are "tiny but valid". It's a theoretically real system
> you
> could build up from, and tells me "how does this behave under musl on a
> bunch of
> targets", using a real Linux kernel and so on.
>
> > is writing over-specific unit tests. rather than writing tests to cover
> "what
> > _must_ this do to be correct?" people cover "what does this specific
> > implementation happen to do right now, including accidental
> implementation
> > details?".
>
> Yup. Seen a lot of that. :(
>
> > (i've personally removed thousands of lines of misguided tests that
> > checked things like "if i pass _two_ invalid parameters to this
> function, which
> > one does it report the error about?", where the correct answer is either
> "both"
> > or "who cares?", but never "one specific one".)
>
> I've bumped into some of that in toysh because I want to match bash's
> behavior
> and alas bash is one of those "the implementation is currently the
> standard"
> things where every implementation detail hiccup IS the current spec.
>
> That said, I've blogged about making a few digressions anyway just because
> my
> plumbing doesn't work like bash's does (they're gratuitously making
> multiple
> passes over the data and I'm doing it all in one pass, and there are some
> places
> where "all x happens before all y" bubbles visibly to the surface and I
> just
> went no.) For example the "order of operations" issue in
> https://landley.net/notes-2021.html#18-03-2021
>
> > coverage - tells you where your arse is hanging out the window _before_
> your
> > users notice. (i've had personal experiences of tests i've written and
> that two
> > other googlers have code reviewed that -- when i finally got the
> coverage data
> > -- turned out to be missing important stuff that [i thought] i'd
> explicitly
> > written tests for.
>
> This is what I mean by testing the error paths. If I have a statement the
> code
> flow doesn't ever go through in testing, I'd like to know why. There's
> presumably tools for this (I think valgrind has something), but that's
> waaaaaay
> down the road.
>

the regular llvm coverage stuff works fine. i sent you a script and the
results a while back, but i don't remember whether i offered to add a `make
coverage` option? (at some point i should get it "for free" from Android's
CI, but not yet.)


> > Android's still working on "real time" coverage data showing
> > up in code reviews, but "real Google" has been there for years, and
> you'd be
> > surprised how many times your tests don't test what you thought they
> did.)
>
> Sadly, I would not by surprised. :(
>
> > the main failure mode i've seen here is that you have to coach people
> that "90% is
> > great", and that very often chasing the last few percent is not a good
> use of
> > time,
>
> https://en.wikipedia.org/wiki/Pareto_principle
>
> And here is an excellent mathematical walkthrough of the math behind it:
>
>   https://www.youtube.com/watch?v=sPQViNNOAkw#t=6m43s
>
> Which is why the old saying "the fist 90% of the work takes 90% of the
> time, the
> remaining 10% of the work takes the other 90% of the time" is ALMOST
> right, it's
> that the next 9% takes another 90% in a xeno's paradox manner (addressing
> 90% of
> what's left takes a constant amount of time) until you shoot the engineers
> and
> go into production.
>
> > and in the extreme can make code worse. ("design for testability" is
> good,
> > but -- like all things -- you can take it too far.)
>
> My grumble is I'm trying to write a lot of tests that toybox and the
> debian host
> utilities can BOTH pass. I want to test the same code linked against glibc,
> musl, and bionic. I want to test it on big endian and little endian, 32
> bit and
> 64 bit, systems that throw unaligned access faults, nommu...
>

yeah, that's an orthogonal problem. (and why i really don't want to lose
32-bit from CI, because i certainly don't use 32-bit [unless you count
Raspberry Pi]!)


> >     You
> >     have to test every decision point (including the error paths), you
> have to
> >     exercise every codepath (or why have that codepath?) and you have to
> KEEP doing
> >     it because every distro upgrade is going to break something.
> >
> > yeah, which is why you want all this stuff running in CI, on all the
> platforms
> > you care about.
>
> People use "continuous integration" as an excuse not to have releases.


some people might do that, but that's orthogonal. "continuous integration"
is literally just "we run the tests on every checkin". whether you use that
for good or evil is orthogonal.


> No two
> people should ever run quite the same version and see quite the same
> behavior,
> we're sure random git snapshot du jour is fine...
>
> I object on principle.
>
> >     In my private emails somebody is trying to make the last aboriginal
> linux
> >     release work and the old busybox isn't building anymore because
> makedev() used
> >     to be in #include <sys/types.h> and now it's moved to
> <sys/sysmacros.h>. (Why? I
> >     dunno. Third base.)
> >
> > the pain of dealing with that pointless deckchair crap with every glibc
> update
> > is one reason why (a) i've vowed never to do that kind of thing again in
> bionic
> > [we were guilty of the same crime in the past, even me personally; the
> most
> > common example being transitive includes] and (b) i'm hoping musl will
> care a
> > bit more about not breaking source compatibility ... but realize he's a
> bit
> > screwed because code expecting glibc might come to rely on the
> assumption that
> > <sys/types.h> *doesn't* contain makedev(), say --- i've had to deal with
> that
> > kind of mess myself too. sometimes you can't win.
>
> He has a very active mailing list and IRC channel (now on libre.net like
> everybody else) where they argue about that sort of thing ALL THE TIME.
> (That
> said, I poked him to see if he wants to make a policy statement about
> this. Or
> has one somewhere already.)
>
> My complaint way back when was objecting to the need to #define
> GNU_GNU_ALL_HAIL_STALLMAN in order to get the definition for linux syscall
> wrappers (which have NOTHING to do with the gnu project). I made puppy
> eyes at
> Rich until he added the _ALL_SOURCE define so musl headers could just give
> me
> everything they knew how to do do without micromanaging feature macros.
> (I'm
> already #including some headers and not including others, that's the
> granularity
> that makes SENSE...)
>

yeah, given that historical accident means that you effectively can't _not_
have _BSD_SOURCE on Android, and that there's a lot of _GNU_SOURCE (but not
all) that was also always "on by default", i've been leaning towards "you
get all the things, all the time". it's a lot less confusing to n00bs, in
particular. and there's more than one of them born every minute.


> And then I wound up doing:
>
>   #define unshare(flags) syscall(SYS_unshare, flags)
>   #define setns(fd, nstype) syscall(SYS_setns, fd, nstype)
>
> anyway. :)
>
> Rob
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.landley.net/pipermail/toybox-landley.net/attachments/20210903/5bb2229f/attachment.htm>


More information about the Toybox mailing list