[Toybox] Toybox test image / fuzzing

Rob Landley rob at landley.net
Sun Mar 13 11:18:58 PDT 2016


On 03/13/2016 03:34 AM, Andy Chu wrote:
>> Unfortunately, the test suite needs as much work as the command
>> implementations do. :(
>>
>> Ok, backstory!
> 
> OK, thanks a lot for all the information!  That helps.  I will work on
> this.  I think a good initial goal is just to triage the tests that
> pass and make sure they don't regress (i.e. make it easy to run the
> tests, keep them green, and perhaps have a simple buildbot).

I fixed "make test_mv" last night. The problem is that
scripts/singleconfig.sh was creating a "mv" that acted like "cp". (I
should write up a blog entry explaining the plumbing. This may fix one
or two other tests, I haven't checked. It should change the "make tests"
build which tests the multiplexer version, which depends on make
menuconfig to tell it what to test.)

> For
> example, the factor bug is trivial but it's a lot easier to fix if you
> get feedback in an hour or so rather than a month later, when you have
> to load it back into your head.

Indeed, but I did most of the fix yesterday and can check it in today.

(I special cased "-" is the first character, to print out a -1 and skip
it, then the rest of the math is unsigned for the larger range. This
means that "-" by itself is treated as -1, I'm not sure how to catch
that without an ugly special case test for that...)

I also switched it to long long, which should make no difference on 64
bit plaforms (with current compilers, anyway; there's nothing STOPPING
128 bit long long the way they wrote LP64, but nobody does it). On 32
bit platforms, it slows it down up to 50%.

>> Really, I need a tests/pending. :(
> 
> Yeah I have some ideas about this.  I will try them out and send a
> patch.  I think there does need to be more than 2 categories as you
> say though, and perhaps more than kind of categorization.

Eventually it should all collapse back into one category, but there's a
lot of work to do between now and then. But tests/posix and tests/lsb
and such make a certain amount of sense, and that would both get us
tests/pending and not have to be undone later.

>> Building scripts to test each individual input is what the test suite is
>> all about. Figuring out what those inputs should _be_ (and the results
>> to expect) is, alas, work.
> 
> Right, it is work that the fuzzing should be able to piggy back on...
> so I was trying to find a way to leverage the existing test cases,
> pretty much like this:
> 
> http://lcamtuf.blogspot.com/2015/04/finding-bugs-in-sqlite-easy-way.html
> 
> But the difference is that unlike sqlite, fuzzing toybox could do
> arbitrarily bad things to your system, so it really needs to be
> sandboxed.  It gives really nasty inputs -- I wouldn't be surprised if
> it can crash the kernel too.

I have plans to sandbox it using
http://landley.net/aboriginal/about.html but haven't finished that yet
because Giant TODO List.

(If I go off to my corner and focus on my todo list, I vanish for
months. Things like sed and ps can easily soak up a couple months each.
If I prioritize interrupts, I jump from topic to topic and wind up with
giant heaps of half-finished stuff, but at least other people can sort
of follow along. :)

> Parsers in C are definitely the most likely successful targets for a
> fuzzer, and sed seems like the most complex parser in toybox so far.

lib/args.c is a pretty complicated parser, and toys/*/find.c is also
moderately horrid in that regard (because it can't leverage lib/args to
do anything in a common way.)

I want to genericize find.c plumbing to have expr.c and maybe test.c do
parenthesization and prioritization and such the same way, but despite
sitting down to think it through more than once haven't come up with a
clean way to factor out the common code yet. I should just do expr.c and
then try to cleanup common code (if any) afterwards. (Yes there's an
expr.c in pending, and when I sat down to try to clean it up I hit
http://landley.net/notes-2014.html#02-12-2014 and then
http://landley.net/notes-2015.html#30-01-2015 and and it's on the todo
list.)

This might help:

ls -loSr toys/{android,example,other,lsb,posix}/*.c

The size of ps is partly illusory, I implemented "ps", "top", "iotop",
"pgrep", and "pkill" in the same command because I hadn't cleaned out
the common infrastructure to move it to lib/proc.c yet. (I should do
that. It can't use any of the GLOBALS/TT stuff and can't use any FLAG_
macros, because neither are available in lib. Oh, and it also shouldn't
ever check toys.which->name to see which command is running. I've got
that mostly cleaned out, need to factor it out into lib. It's on the
todo list.)

> The regex parsing seem to be handled by libraries, and I don't think
> those are instrumented (because they are in a shared library not
> compiled with afl-gcc).  I'm sure we can find a few more bugs though.

I'd prioritize musl and bionic. As far as I'm concerned uClibc is dead
(and uClibc-ng is necromancy, not a fresh start), and glibc is big iron
along with the rest of the GNU/nonsense.

>> There's also the fact that either the correct output or the input to use
>> is non-obvious. It's really easy for me to test things like grep by
>> going "grep -r xopen toys/pending". There's a lot of data for it to bite
>> on, and I can test ubuntu's version vs mine trivially and see where they
>> diverge.
> 
> Yeah there are definitely a lot of inputs beside the argv values, like
> the file system state and kernel state.

I'm working on tests/files. I need directory traversal weirdness with
some symlinks and different permissions and fifos and such, but I
suspect I need a tarball and/or script to set those up because trying to
check intentionally filesystem corner cases into git is not a happy thought.

> Those are harder to test, but
> I like that you are testing with Aboriginal Linux and LFS.  That is
> already a great torture test.

Indeed, and ~2 weeks ago I was churning through LFS 7.8 packages until I
got distracted. I should get back to that. It's on the todo list.

> FWIW I think the test harness is missing a few concepts:
> 
> - exit code

blah; echo $?

> - stderr

2>&1

> - file system state -- the current method of  putting setup at the
> beginning of foo.test *might* be good enough for some commands, but
> probably not all

I mentioned the need for a standard directory of files everything can
assume is there, and tests/files being a start of that. For testing by
hand I just use the toybox source du jour, but that's obviously
unsuitable for automated testing.

That said, these test scripts are shell scripts. You can do any
setup/teardown you need to. The automated stuff is a convenience.

That said, right now the tests are run by sourcing them, which means
there's potential leftover crap if you define shell functions and such.
I need to make sure there's an appropraite ( ) subshell at the right
places. (When I first wrote this, I knew the answers to that sort of
thing off the top of my head. That was in... 2005? Now I have to go back
and confirm and add comments, but that's what other people have to do
looking at my code so probably a net win...)

> But this doesn't need to be addressed initially.
> 
> By the way, is there a target language/style for shell and make?

I'm targeting bash (but older bash, like bash 2 with only a couple bash
3 features like ~=), because toybox's shell should be a proper bash
replacement, and toybox building itself is an obvious smoketest.

That said, there's a bootstrapping problem on weird systems. If I could
carve out the toysh.c and sed.c standalone builds so they can be run on
systems that haven't got acceptable versions of those commands, I'd
increase the portability of toybox a lot. (It still mucks about in /proc
and /sed looking for stuff, and calls some linux-only syscalls and
ioctls, but everybody and their dog has a linux emulation layer these
days. Large chunks of posix is still stuck in the 1970's, and they
always chickened out about standardizing things like "mount" or "init"
so you can't _boot_ a system that doesn't go beyond posix.)

> It looks like POSIX shell, and I'm not sure about the Makefile -- is it
> just GNU make or something more restrictive?  I like how you put most
> stuff in scripts/make.sh -- that's also how I like to do it.

In theory make is only there to provide the expected API. In practice,
the kconfig subdirectory was copied from Linux 2.6.12 and I need to
write a new one from scratch. (It's on the todo list! Note we only use
the generated .config file which is produced from our Config.in source,
so washing data through that plumbing doesn't affect the copyright and
thus license of the resulting binary. But it's an ugliness that really
should go bye-bye, and now that I've broken open the
lib/interestingtimes.c and lib/linestack.c can of worms... It's on the
todo list.)

> What about C?  Clang is flagging a lot of warnings that GCC doesn't,
> mainly -Wuninitialized.

The Android guys build with clang against bionic. I need to set up a
local clang toolchain, but my netbook is still ubuntu 12.04 and AOSP's
moved on to 14.04. It's on the todo list.

That said, gcc produces buckets of _spurious_ "may be used uninitialized
but never actually is" warnings, which I sometimes silence with "int
a=a;" in the declarations. (Generates no code but shuts up the warning.)

Are these _real_ uninitialized warnings? I'm very interested in those,
but find wading through large quantities of false positives tiresome.
(That's why I'm not a big fan of static analysis either. False positives
as far as the eye can see.)

>> But putting that in the test suite, I need to come up with a set of test
>> files (the source changes each commit, source changes shouldn't cause
>> test case regressions). I've done a start of tests/files with some utf8
>> code in there, but it hasn't got nearly enough complexity yet, and
>> there's "standard test load that doesn't change" vs "I thought of a new
>> utf8 torture test and added it, but that broke the ls -lR test."
> 
> Some code coverage stats might help?  I can probably set that up as
> it's similar to making an ASAN build.  (Perhaps something like this
> HTML http://llvm.org/docs/CoverageMappingFormat.html)

Ooh, that sounds interesting.

> The build patch I sent yesterday will help with that as well since you
> need to set CFLAGS.

I lost it in the noise, I need to do a pass over the mailing list web
archive again today and see what's fallen through the cracks...

>> Or with testing "top", the output is based on the current system load.
>> Even in a controlled environment, it's butterfly effects all the way
>> down. I can look at the source files under /proc I calculated the values
>> from, but A) hugely complex, B) giant race condition, C) is implementing
>> two parallel code paths that do the same thing a valid test? If I'm
>> calculating the wrong value because I didn't understand what that field
>> should mean, my test would also be wrong...
>>
>> In theory testing "ps" is easier, but in theory "ps" with no arguments
>> is the same as "ps -o pid,tty,time,cmd". But if you run it twice, the
>> pid of the "ps" binary changes, and the "TIME" of the shell might tick
>> over to the next second. You can't "head -n 2" that it because it's
>> sorted by pid, which wraps, so if your ps pid is lower than your bash
>> pid it would come first. Oh, and there's no guarantee the shell you're
>> running is "bash" unless you're in a controlled environment... That's
>> just testing the output with no arguments.)
> 
> Those are definitely hard ones... I agree with the strategy of
> classifying the tests, and then we can see how many of the hard cases
> are.  I think detecting trivial breakages will be an easy first step,
> and it should allow others to contribute more easily.

Initially I was only adding tests that either passed or showed something
interesting I needed to fix. This left large holes in the test suite
that I didn't know how to fill in yet, and when other people filled them
in I don't necessarily know how to fix them yet.

I'm glad somebody's taking a look. :)

> thanks,
> Andy

No, thank _you_,

Rob

 1457893138.0


More information about the Toybox mailing list