[Toybox] Kernighan's awk repo

Thu Jul 21 14:27:58 PDT 2016

On 07/21/2016 11:58 AM, Andy Chu wrote:
> On Thu, Jul 21, 2016 at 9:04 AM, enh <enh at google.com> wrote:
>> On Wed, Jul 20, 2016 at 9:20 PM, Andy Chu <andychup at gmail.com> wrote:
>>>> yeah, i'd like to see asan support in toybox's makefile; i'm starting
>>>> to get a small pile of bug reports from the folks running asan Android
>>>> system images, but it's going to be inconvenient to have to deal with
>>>> them on the device rather than just on the host.
>>>
>>> How are they generating test input?
>>
>> this isn't from fuzzing.
>>
>>> When you say "folks running", is
>>> that an automated or manual process?
>>
>> for toybox, afaik, it's just people noticing toybox problems while
>> investigating other things.
> 
> OK, so basically there are humans walking around with Android phones
> where every system binary is instrumented with ASAN?  It's cool that
> it's fast enough for that.  As I recall the ASAN slowdown is supposed
> to be within 2x, so I can see that.

Because no bugs were ever found by anything but ASAN in the history of
computing. Grace hopper used it to tape that moth into her logbook.

>>> In case it got lost, these patches I sent out in March added ASAN
>>> support to Toybox's Makefile, as well as scripts for running tests
>>> under the ASAN-instrumented binaries (and other LLVM sanitizers)
>>>
>>> http://lists.landley.net/pipermail/toybox-landley.net/2016-March/008147.html
>>
>> no, i remember that and my plan is to give it a try as soon as i have
>> time. which is why the longer version of my plan is "mention it on the
>> list knowing i won't get round to it for a while, and maybe it'll be
>> easier by the time i get there anyway" :-)

I have it slightly lower priority because I have hundreds of hours of
work to do on the test suite that I already know how to do and which I'm
highly confident will increase its coverage and quality, and which I
believe need to be done _anyway_.

> OK, the patches no longer apply since Rob started rewriting parts of
> the related build stuff, but didn't anywhere close to ASAN afaik.  But
> it shouldn't be too hard to checkout a commit as of March and apply
> them.
> 
> There are some more instructions here when I reproduce the expr bug
> that was introduced:
> 
> http://lists.landley.net/pipermail/toybox-landley.net/2016-April/008214.html
> 
> And Rob you clearly never ran that, because the ONLY bug it flagged
> was YOUR bug.  You're simply being ignorant by writing that it's a
> false positive generator.  Please try things before writing long
> messages full of nonsense.

No, I'm saying I've poked at or been poked by a large number of
debugging tools over the years, from valgrind to tinycc's -b bounds
checker (heck everybody implements their own version of "electric fence"
style malloc/free wrappers about 3 years into using C, mine was under
OS/2 to try to keep the _5_ different allocation contexts in Workplace
Shell code straight).

And having done that, I've found the best use of time to be:

A) thorough code inspection

B) collecting every test I ever run during development (against a
reference version version AND against the new one, even little things to
check if what I've just implemented does what I expect) into a
regression test suite

C) running as much real world data through the code as possible and
following up any behavior deviations. (I've lost weeks at a time to
"Autoconf said no here when the other one said yes! It still builds and
seems to work but _why_ did it do that...")

Everybody has a tool they consider indispensable, which other people
manage to avoid ever encountering and yet somehow survive. You did not
approach this new addition as "here's a cool thing that might help", you
approached it as "clearly you will do what I say because you'd be insane
not to".

This does not get your thing put high on my todo list. Yes, you found a
bug. There's no shortage of those. Every time I sit down to expand a
nontrivial command's test suite I find 2 or 3. Working on grep --color
I've found several (mentioned here recently). Yesterday I got sent
https://github.com/landley/toybox/issues/36 which I have a window open
for, which wouldn't even have been mentioned here on the list otherwise.
I'm working on it.

(Far and away the biggest design flaw in stuff like grep and sed is that
readline() allocates an unbounded amount of memory controlled by the
input. That's BAD, I've complained about it here, and I dunno how to fix
it at the design level other than introduce limitations into the tool so
its maximum line length is 1 meg or some such. Triggering the OOM killer
is kind of a big deal.)

> And you're the one who asked for the bwk repo with tests, so I
> published it, so you should actually run it.

I downloaded it, looked at the README to see it was still unchanged, ran
./run.sh and it produced now output, "ran ./run.sh help" and got the
bash help output, and put it in ~/toybox/pending so that when I get
around to implementing awk I can get back to it.

Note that ~/toybox/pending is not the same as ~/toybox/toybox/pending.
My project directories tend to be two layers deep, the outer one of
which is where I keep clutter that shouldn't go in the source directory.
(It DOES wind up in the soruce directory but it gives me somewhere to
put it when I periodically clean it out.) ~/toybox/pending is "here's
stuff that's helpful when I get around to implementing more commands in
future".

> I'm not promising to make the patches apply again, but given that I
> showed examples of adding it to toybox, and adding it to bwk, it
> should be straightforward to follow that model.

My sad little netbook is currently building llvm from source (I tend to
do that with projects that aren't quite done cooking yet because the
ubuntu packaged versions are stale and weirdly configured. I had llvm
installed as an nvc prerequisite, but turns out clang didn't get enabled
right that time so I have to build it again, the beyond linux from
scratch instructions are my starting point) and once that's installed I
can try getting asan working from the command line with the extra
CFLAGS. I should probably have something like ALL_CFLAGS= to override
the default ones and just have CFLAGS itself add to it. (See "expected
development user interface".)

I can then either add a "make asan" target that builds toybox with asan,
or add an ASAN=1 that enables it, or have an scripts/asan.sh wrapper
that calls scripts/make.sh with the extra asan variables, or...

(The problem is I don't wanna add asan_sed and asan_test_sed and so on
ad ifinitum. This is a shift key, not its own button. And really, I
don't see how this is more special than libmudflap and the other 37
possible address sanitizers and emulation environments and so on. I
remember finding out busybox had a config option for something like
"dmalloc" that was just adding a linker option to LDFLAGS and I went
"why"? Which is THIS one built in but all the others aren't? No, adding
dozens of things I'll never regression test isn't an improvement.)

> Andy

Rob