[Toybox] Toybox test image / fuzzing

enh enh at google.com
Sun Mar 13 11:06:26 PDT 2016


#include <cwhyyoushouldbedoingunittestinginstead>

only having integration tests is why it's so hard to test toybox ps
and why it's going to be hard to fuzz the code: we're missing the
boundaries that let us test individual pieces. it's one of the major
problems with the toybox design/coding style. sure, it's something all
the existing competition in this space gets wrong too, but it's the
most obvious argument for the creation of the _next_ generation
tool...

On Sun, Mar 13, 2016 at 12:34 AM, Andy Chu <andychup at gmail.com> wrote:
>> Unfortunately, the test suite needs as much work as the command
>> implementations do. :(
>>
>> Ok, backstory!
>
> OK, thanks a lot for all the information!  That helps.  I will work on
> this.  I think a good initial goal is just to triage the tests that
> pass and make sure they don't regress (i.e. make it easy to run the
> tests, keep them green, and perhaps have a simple buildbot).  For
> example, the factor bug is trivial but it's a lot easier to fix if you
> get feedback in an hour or so rather than a month later, when you have
> to load it back into your head.
>
>> Really, I need a tests/pending. :(
>
> Yeah I have some ideas about this.  I will try them out and send a
> patch.  I think there does need to be more than 2 categories as you
> say though, and perhaps more than kind of categorization.
>
>> Building scripts to test each individual input is what the test suite is
>> all about. Figuring out what those inputs should _be_ (and the results
>> to expect) is, alas, work.
>
> Right, it is work that the fuzzing should be able to piggy back on...
> so I was trying to find a way to leverage the existing test cases,
> pretty much like this:
>
> http://lcamtuf.blogspot.com/2015/04/finding-bugs-in-sqlite-easy-way.html
>
> But the difference is that unlike sqlite, fuzzing toybox could do
> arbitrarily bad things to your system, so it really needs to be
> sandboxed.  It gives really nasty inputs -- I wouldn't be surprised if
> it can crash the kernel too.
>
> Parsers in C are definitely the most likely successful targets for a
> fuzzer, and sed seems like the most complex parser in toybox so far.
> The regex parsing seem to be handled by libraries, and I don't think
> those are instrumented (because they are in a shared library not
> compiled with afl-gcc).  I'm sure we can find a few more bugs though.
>
>> There's also the fact that either the correct output or the input to use
>> is non-obvious. It's really easy for me to test things like grep by
>> going "grep -r xopen toys/pending". There's a lot of data for it to bite
>> on, and I can test ubuntu's version vs mine trivially and see where they
>> diverge.
>
> Yeah there are definitely a lot of inputs beside the argv values, like
> the file system state and kernel state.  Those are harder to test, but
> I like that you are testing with Aboriginal Linux and LFS.  That is
> already a great torture test.
>
> FWIW I think the test harness is missing a few concepts:
>
> - exit code
> - stderr
> - file system state -- the current method of  putting setup at the
> beginning of foo.test *might* be good enough for some commands, but
> probably not all
>
> But this doesn't need to be addressed initially.
>
> By the way, is there a target language/style for shell and make?  It
> looks like POSIX shell, and I'm not sure about the Makefile -- is it
> just GNU make or something more restrictive?  I like how you put most
> stuff in scripts/make.sh -- that's also how I like to do it.
>
> What about C?  Clang is flagging a lot of warnings that GCC doesn't,
> mainly -Wuninitialized.
>
>> But putting that in the test suite, I need to come up with a set of test
>> files (the source changes each commit, source changes shouldn't cause
>> test case regressions). I've done a start of tests/files with some utf8
>> code in there, but it hasn't got nearly enough complexity yet, and
>> there's "standard test load that doesn't change" vs "I thought of a new
>> utf8 torture test and added it, but that broke the ls -lR test."
>
> Some code coverage stats might help?  I can probably set that up as
> it's similar to making an ASAN build.  (Perhaps something like this
> HTML http://llvm.org/docs/CoverageMappingFormat.html)
>
> The build patch I sent yesterday will help with that as well since you
> need to set CFLAGS.
>
>
>> Or with testing "top", the output is based on the current system load.
>> Even in a controlled environment, it's butterfly effects all the way
>> down. I can look at the source files under /proc I calculated the values
>> from, but A) hugely complex, B) giant race condition, C) is implementing
>> two parallel code paths that do the same thing a valid test? If I'm
>> calculating the wrong value because I didn't understand what that field
>> should mean, my test would also be wrong...
>>
>> In theory testing "ps" is easier, but in theory "ps" with no arguments
>> is the same as "ps -o pid,tty,time,cmd". But if you run it twice, the
>> pid of the "ps" binary changes, and the "TIME" of the shell might tick
>> over to the next second. You can't "head -n 2" that it because it's
>> sorted by pid, which wraps, so if your ps pid is lower than your bash
>> pid it would come first. Oh, and there's no guarantee the shell you're
>> running is "bash" unless you're in a controlled environment... That's
>> just testing the output with no arguments.)
>
> Those are definitely hard ones... I agree with the strategy of
> classifying the tests, and then we can see how many of the hard cases
> are.  I think detecting trivial breakages will be an easy first step,
> and it should allow others to contribute more easily.
>
> thanks,
> Andy
> _______________________________________________
> Toybox mailing list
> Toybox at lists.landley.net
> http://lists.landley.net/listinfo.cgi/toybox-landley.net



-- 
Elliott Hughes - http://who/enh - http://jessies.org/~enh/
Android native code/tools questions? Mail me/drop by/add me as a reviewer.

 1457892386.0


More information about the Toybox mailing list