[Toybox] awk (Re: ps down, top to go)

Andy Chu andychup at gmail.com
Sun May 8 11:06:33 PDT 2016


On Fri, May 6, 2016 at 9:11 PM, Rob Landley <rob at landley.net> wrote:
> (The end in sight for _busybox_ in my own use cases is next up on my
> todo list. Really not looking forward to implementing awk, but it's
> gotta be done...)


I'm curious what research you've done on awk?

>From my research, it seems like a significant easier problem than the
shell.  Without interactive parsing and a completion system, it's
probably 2-3x simpler, and if you account for that, it's probably 5x
simpler.

Once thing that I didn't realize is that Ubuntu and Debian use mawk
instead of gawk as their default awk.  So I assume all their package
building scripts run with mawk?  That's good because mawk is a lot
smaller than gawk.

And I think Aboriginal Linux runs with busybox awk?  That's also good
because busybox awk is much smaller than mawk!

I took a peek at 4 implementations:

- gawk - GPLv3 - 66 K lines + 14K lines of extensions.  Yacc grammar.
(This has a C extension interface, profiler and debugger, a somewhat
ugly networking library built-in, etc.)

- mawk (updated 2015) - GPLv2 - 21K lines.  Yacc grammar.  (It's
supposed to be fast because it's based on a byte-code interpreter
rather than walking a tree?)

- busybox awk - GPLv2 - ~3300 lines in editors/awk.c, though it's not
clear to me how much library code is used.  It includes xregex.h
although also uses libc regexec().  Hand-written parser.

- Kernighan Awk (updated 2012) - 8K lines.  Lucent BSD? license.  Yacc grammar.

(Some of the line counts may be a bit off because I didn't really
tease out the source parse.y file vs the generated .c and .h files)

All of them use Yacc except busybox, which isn't that surprising
because I heard Kernighan say that Yacc was foundational in developing
awk.  They designed the language with it.

Busybox awk is impressively small.  I thought you said there was a lot
of hairy awk in binutils or something, so I'm guessing that all runs
under busybox awk?

I'm guessing it's not possible for toybox to borrow code from it
because of the license, but I wonder about the Lucent license.  The
lexer is 582 lines of clean looking C code (it's Kernighan, so I guess
we all know his style :) ), which is not insignificant!

Andy



More information about the Toybox mailing list