[Toybox] Toybox test image / fuzzing

Sun Mar 13 15:04:07 PDT 2016

On 03/13/2016 03:32 PM, Rob Landley wrote:
> On 03/13/2016 02:13 PM, Samuel Holland wrote:
>> On 03/13/2016 01:18 PM, Rob Landley wrote:
>>> On 03/13/2016 03:34 AM, Andy Chu wrote:
>>>> FWIW I think the test harness is missing a few concepts:
>>>>
>>>> - exit code
>>>
>>> blah; echo $?
>>>
>>>> - stderr
>>>
>>> 2>&1
>>
>> I think the idea here was the importance of differentiating between
>> stdout and stderr, and between text output and return code.
>
> You can do this now. "2>&1 > /dev/null" gives you only stdout, or
> 2>file... There are many options.
>
>> simple as having a separate output variable for each type of
>> output.
>>
>> Granted, it will usually be unambiguous as to the correctness of
>> the program,
>
> Yes, adding complexity to every test that isn't usually needed.
>
>> but having the return code in the output string can be confusing
>> to the human looking at the test case. Plus, why would you not want
>> to verify the exit code for every test?
>
> Because science is about reducing variables and isolating to test
> specific things?

If you want to reduce variables, see the suggestion about unit testing.

> Because "we can so clearly we should" is the gnu approach to things?
> Because you're arguing for adding complexity to the test suite to do
> things it can already do, and in many cases is already doing?

find tests -name *.test -print0 | xargs -0 grep 'echo.*yes' | wc -l
182

Considering how many times this pattern is already used, I don't see it
adding much complexity. It's trading an ad hoc pattern used in ~17% of
the tests for something more consistent and well-defined. Never mind the
fact that using &&/|| doesn't tell you _what_ the return code was, only
a binary success or failure.

I have seen a couple of tests that pass because they expect failure, but
the command is failing for the wrong reason.

> Because we've already got 5 required arguments for each test and
> you're proposing adding a couple more, and clearly that's going to
> make the test suite easier to maintain and encourage additional
> contributions?

Personally, yes, I think so. If everything is explicit, there is less
"hmmm, how do I test that? I guess I could throw something together
using the shell. Now I have to sift through the existing tests to see if
there is precedent, and what that is." Instead, you put the inputs here,
the outputs there, and you're done.

I admit that filesystem operations are a whole new can of worms, and I
do not have a good answer to those.

> I'm not saying it's _not_ a good extension, but you did ask.
> Complexity is a cost, spend it wisely. You're saying each test
> should test more things, which means potential false positives and
> more investigation about why a test failed (and/or more complex
> reporting format).

On the other hand, splitting up the outputs you check gives you _more_
information to help you investigate the problem. Instead of
"FAIL: foobar" you get something like "FAIL (stdout mismatch): foobar".

(As a side note, the test harness I've written recently even gives you a
diff of the expected and actual outputs when the test fails.)

> Also, "the return code" implies none of the tests are pipelines, or
> multi-stage "do thing && examine thing" (which _already_ fails if do
>  thing returned failure, and with the error_msg() stuff would have
> said why to stderr already). Yesterday I was poking at mv tests
> which have a lot of "mv one two && [ -e two ] && [ ! -e one ] && echo
> yes" sort of constructs. What is "the exit code" from that?

Well, if we are testing mv, then the exit code is the exit code of mv.

The rest is just checking filesystem state after the fact. I claimed to
not know what to do about it, but in the interest of avoiding punting
the question, here's the answer off the top of my head (even though I
know you aren't going to like it much):

Add an argument to the testing command that contains a predicate to eval
after running the test. If the predicate returns true, the test
succeeded; if the predicate returns false, the test failed. That way,
the only command that is ever in the "command" argument is the toy to test.

>> It's a lot of duplication to write "echo $?" in all of the test
>> cases.
>
> I don't. Sometimes I go "blah && yes" or "blah || yes", when the
> return code is specifically what I'm testing, and sometimes checking
> the return code and checking the output are two separate tests.

Okay, so it's a lot of duplication to write "&& yes" all over the place. :)

> Keep in mind that error_msg() and friends produce output, and the
> tests don't catch stderr by default but pass it through. If we catch
> stderr by default and a test DOESN'T check it, then it's ignored
> instead of visibile to the caller.

I'm not sure how you could _not_ check stderr. The test case has a
string, the command generates a string, you compare the strings. If you
want to pass it through, nothing prevents that.

> Also, keep in mind I want the host version to pass most of these
> tests too, and if there are gratuitous differences in behavior I
> don't WANT the test to fail based on something I don't care about
> and wasn't trying to test. You're arguing for a tighter sieve with
> smaller holes when I've already received a bunch of failing tests
> that were written against gnu and never _tried_ against the toybox
> version, and failed for reasons that aren't real failures.

If you want to do that, then yes, you definitely need a looser sieve. I
get 60 more failures here with TEST_HOST than I do with allyesconfig. I
agree that checking stderr on toybox vs. busybox or GNU is going to be
impossible because of differing error messages. Possible solutions
include not checking stderr by default (only formalizing the exit code
check), or simply not checking stderr when TEST_HOST=1.

>> As for stdout/stderr, it helps make sure diagnostic messages are
>> going to the right stream when not using the helper functions.
>
> Right now diagnostic messages are visible in the output when running
> the test. There shouldn't be any by default, when there is it's
> pretty obvious because those lines aren't colored.
>
> I'm all for improving the test suite, but "what I think the test
> suite should be trying to do differs from what you think the test
> suite should be trying to do, therefore I am right" is missing some
> steps.

Then I guess it's not exactly clear to me what you are trying to do with
the test suite. My interpretation of the purpose was to verify
correctness (mathematically, string transformations, etc.) and standards
compliance. In that sense, for each set of inputs to a command
(arguments, stdin, filesystem state), there is exactly one set of
correct outputs (exit code, stdout/stderr, filesystem state), and
therefore the goal of the test suite is to compare the actual output to
the correct output and ensure they match. If you don't check the exit
code, you are missing part of the output.

I won't tell you that you have to do it any one way. It's your project.
Of course, complexity is to some extent a value judgment. If you think
it adds too much complexity/strictness for your taste, that's fine. I
was just trying to explain the reasoning behind the suggestion, and why
I think it's a reasonable suggestion.

> Rob

P.S. Your other reply came in just as I had finished typing. Sorry if 
some of this is already addressed.

--
Regards,
Samuel Holland <samuel at sholland.org>

 1457906647.0