[Toybox] Toybox test image / fuzzing

Sat Mar 12 09:51:25 PST 2016

On 03/11/2016 05:12 PM, Andy Chu wrote:
> What is the best way to run the toybox tests?  If I just run "make
> test", I get a lot of failures, some of which are probably because I'm
> not running as root, while some I don't understand, like:
> 
> PASS: pgrep -o pattern
> pgrep: bad -s '0'
>                ^
> FAIL: pgrep -s

Unfortunately, the test suite needs as much work as the command
implementations do. :(

Ok, backstory!

Where I started toybox I had a big todo list, and was filling it in.
Some got completed and some (like sh.c, mdev.c, or mke2fs.c) got
partially finished and put on hold for a long time.

Then I started getting contributions of new commands from other
developers, some of which were easy to verify, polish up, and declare
done, and some of which required extensive review (an in several cases
an outright rewrite). I used to just merge stuff and track the state of
it in a local text file, but that didn't scale, and I got overwhelmed.

So I created toys/pending and moved all the unfinished command
implementations there. (And a lib/pending.c for shared infrastructure
used by toys/pending which needs its own review/cleanup pass.) After a
while I wrote a page (http://landley.net/toybox/cleanup.html) explaining
about the "pending" directory and the work I do to promote stuff _out_
of the pending directory, in hopes other people would be interested in
doing some of the cleanup for me.

But people kept asking how they could help other than implementing new
commands that would go into the giant toys/pending pile, or doing
cleanup, and the next logical thing for me was "test suite". So I
suggested that.

And got a lot of test suite entries full of tests that don't pass, tests
that don't actually test anything interesting in toybox (some test the
kernel, most don't test the interesting edge cases, none of them were
written with a though reading of the relevant standards document and/or
man page...)

Really, I need a tests/pending. :(

There's a missing layer of test suite infrastructure, which isn't just
"this has to be tested as root" but "this has to be tested on a known
system with a known environment". Preferably a synthetic one running
under an emulator, which makes it a good fit for my aboriginal linux
project with its build control images:

  http://landley.net/aboriginal/about.html
  http://landley.net/aboriginal/control-images

Unfortunately, when I tried to do this, the first one I did was "ps" and
making the process less ps -a sees reproducible is hard, because the
kernel launches a bunch of kernel threads based on driver configuration
and kernel version, so getting stable behavior out of that was enough of
a head-scratcher it went back on the todo list. I should try again with
"mount" or something...

Anyway, I've done a few minor cleanup passes over the test suite, but an
awful lot of it is still tests that fail because the test is wrong, or
lack of test coverage.

One example of a test I did some cleanup on was tests/chmod.test, a "git
log" of that might be instructive? That said, the result isn't remotely
_complete_. (Endless cut and paste of "u+r" makes this ls output that's
not a loop, but no tests for the sticky bit? Nothing sets the excutable
bit on a script and then tests we can run it? Removes exec permission
from a directory and checks we can't ls it? Removes read permission from
a file and checks we can't read it? No, all it tests is ls output over
and over...)

> I'm on an Ubuntu 14.04 machine, running against the master branch.  I
> didn't try running as root since it seems like there is a non-zero
> chance that it will mess up my machine.

Very much so!

That's why I need to do an aboriginal linux test harness that boots
under qemu and runs tests in a known chroot.

> I saw in the ELC YouTube talk that test infrastructure is a TODO.
> 
> http://landley.net/talks/celf-2015.txt
> 
> Is this something I can help with?

If you could just triage the test suite and tell me the status of the
tests, that would be great. (I've been meaning to do that forever, but
every time I try I get distracted by fixing up a specific test and the
related command...)

First pass, you could sort the tests into:

1) this command is hard to test due to butterfly effects (run it twice
get different output, so even a known emulated environment won't help;
top, ps, bootchartd, vmstat...)

2) This command could be produce reliable output under an emulated
environment. This includes everything requiring root access. (Properly
testing oneit probably requires containers _within_ an emulator, but
let's burn that bridge when we come to it.)

3) This command can have a good test now. (Whether it _does_ is separate.)

Then let's put #1 and #2 aside for the moment and concentrate on filling
out #3.

> I guess if you can tell me what
> environment you use to get all tests to pass, it shouldn't be too hard
> to make a shell script to create that environment, probably with
> Aboriginal Linux.

Unfortunately, there isn't one. The test suite's bit rotted ever since I
started getting significant contributions to it without having a
"pending" directory to separate curated from wild tests. :(

> I have built Aboriginal Linux before (like a year ago). 
> 
> One of the reasons I ran into this was because I wanted to distill a
> test corpus for fuzzing from the shell test cases.  afl-fuzz has a
> utility to minimize a test corpus based on code path coverage.  So
> getting a stable test environment seems like a prerequisite for that.

Looking at the tests, I suspect my recent changes to the dirtree
infrastructure broke "mv". (Something did, anyway...)

There's also the issue that "make test_mv" and "make tests" actually
test slightly different things. The first builds the command standalone,
and not all commands build correctly standalone. (That might be why
"make test_mv" didn't work, if it's not building standalone...)

Sometimes the command needs fixing, sometimes the build infrastructure
needs fixing, sometimes the test needs fixing...

> FWIW, I had a different approach for fuzzing each arg:
> 
> https://github.com/andychu/toybox/commit/ff937e97881bfdf4b1221618c38857b75c9534e0
> 
> This seems to be a little laborious, because I have to manually write
> shell scripts to fuzz individual inputs (and I didn't find anything
> beyond that one crash yet).  I think the mass fuzzing thing might work
> better, but I'm not sure.

Building scripts to test each individual input is what the test suite is
all about. Figuring out what those inputs should _be_ (and the results
to expect) is, alas, work.

There's also the fact that either the correct output or the input to use
is non-obvious. It's really easy for me to test things like grep by
going "grep -r xopen toys/pending". There's a lot of data for it to bite
on, and I can test ubuntu's version vs mine trivially and see where they
diverge.

But putting that in the test suite, I need to come up with a set of test
files (the source changes each commit, source changes shouldn't cause
test case regressions). I've done a start of tests/files with some utf8
code in there, but it hasn't got nearly enough complexity yet, and
there's "standard test load that doesn't change" vs "I thought of a new
utf8 torture test and added it, but that broke the ls -lR test."

Or with testing "top", the output is based on the current system load.
Even in a controlled environment, it's butterfly effects all the way
down. I can look at the source files under /proc I calculated the values
from, but A) hugely complex, B) giant race condition, C) is implementing
two parallel code paths that do the same thing a valid test? If I'm
calculating the wrong value because I didn't understand what that field
should mean, my test would also be wrong...

In theory testing "ps" is easier, but in theory "ps" with no arguments
is the same as "ps -o pid,tty,time,cmd". But if you run it twice, the
pid of the "ps" binary changes, and the "TIME" of the shell might tick
over to the next second. You can't "head -n 2" that it because it's
sorted by pid, which wraps, so if your ps pid is lower than your bash
pid it would come first. Oh, and there's no guarantee the shell you're
running is "bash" unless you're in a controlled environment... That's
just testing the output with no arguments.)

> thanks,
> Andy

Rob

 1457805085.0