[Toybox] awk (Re: ps down, top to go)

enh enh at google.com
Mon May 23 11:28:52 PDT 2016


On Sun, May 22, 2016 at 11:23 AM, Rob Landley <rob at landley.net> wrote:
> On 05/11/2016 01:41 AM, Andy Chu wrote:
>>> Oh I was quite impressed with Lua, but all programming languages operate
>>> within a framework and Lua intentionally doesn't provide a usable
>>> standard framework.
>>
>> The way I think of it is that Lua doesn't provide the program with any
>> "capabilities" by default (in the security sense).  You have to
>> explicitly grant capabilities by providing hooks to your application.
>
> Providing write() but not printf(), or + - * / but no math library with
> trig functions, has nothing to do with security.
>
> The X11 problem was always "Here's a window and line drawing primitives.
> Creating a toolkit for buttons and sliders and pulldown menus and such
> is left as an exercise to the user, there's no standard one provided and
> 12 non-standard ones which all suck".
>
> Hence qt vs gtk. They don't let you do anything you couldn't without
> them, they just save you writing giant piles of code yourself.
>
>> This is actually one of the things that attracted me to it, since
>> having a secure environment opens up some interesting possibilities
>> with executing remote code (like JavaScript).
>
> The most secure system is powered off, ground into a fine powder, mixed
> with acid, encased in concrete, and dropped into a deep sea trench.
> Ideally in a way that the acid will eat through the concrete and
> dissolve the whole mess into the ocean near the bottom. (And that's
> assuming you haven't got the budget to fire it into the sun and closely
> monitor its entire trip there.)
>
>> Tcl has a similar embedded language design philosophy, but it happened
>> to come with GUI libraries and such which made it popular for awhile.
>>
>> I don't think Lua "refused" to provide a standard library... people
>> were mostly using it for games and embedded applications, and there
>> just wasn't a strong enough community running it on POSIX or whatever.
>>
>> It was just 1 or 2 academics who wrote all the code -- they never had
>> a public source repo or accepted patches.
>
> I was under the impression it had a vigorous community doing stuff for a
> decade before anybody who spoke English noticed, because they were doing
> it in portugese.
>
> Practical result's the same either way.
>
>>>> busybox awk looks like a pretty straightforward interpreter
>>>> architecture from what I can tell -- lex, parse, walk a tree to
>>>> execute, and runtime support with hash tables and so forth.
>>>
>>> Possibly awk and sh can share parser infrastructure. Not sure yet.
>>
>> One thing to note is that they use opposite parsing algorithms:
>>
>> * sh: All implementations except bash use a hand-written recursive
>> descent parser, i.e. top down parsing; whereas bash uses yacc, i.e.
>> bottom up parsing.  And bash regrets the choice.
>
> I wasn't planning to use yacc.
>
>> * awk: All implementations except busybox awk use yacc (bottom up).
>
> I wasn't planning to use yacc here either.
>
>> It's not entirely clear to me what algorithm busybox awk is using; I
>> think it is a hand-written bottom up parser.  Doesn't look like
>> recursive descent for sure.
>
> My limiting factor with awk is I need to collect a large corpus of awk
> test scripts so I know what success looks like.

(coincidentally i was joking to someone last week that you could
probably replace awk with a binary that only understood "{ print $2;
}" and it would be years before anyone would notice. it's been a long
time since i saw anything else in real life...)

>> The difference arises from the language itself.  The main sh language
>> has no expressions and hence no left recursion; it's essentially LL(1)
>> (except for looking ahead to find the ( in a function def).
>
> You can recurse, you can throw stuff on a stack. Not a big deal either way.
>
> No man page for ll or LL. When I type "ll" Ubuntu has it as an alias for
> ls -l (so no prompt for a package to install). And LL says command not
> found (again, no prompt for a package to install).

(https://en.wikipedia.org/wiki/LL_grammar)

>> Awk has TWO expression languages -- the conditions can be combined
>> with boolean logic (e.g. $1 == "foo" && $2 == "bar), and the
>> procedural action language has arithmetic.  So bottom up parsing works
>> better here.
>
> Don't care.
>
>>> What is and isn't a bug is... It took me a while to figure out why this
>>> works:
>>>
>>>   for i in a b c; do echo $i; done
>>>
>>> But this is a syntax error even though I can put a newline after the do:
>>>
>>>   for i in a b c; do; echo $i; done
>>
>> The shell syntax is definitely weird at first, but this distinction
>> follows directly from the POSIX grammar -- which I mentioned is
>> accurate in the sense that all the implementations I tested are very
>> conformant.  (The exception is bash which doesn't allow unbraced
>> single command function definitions.  Try "func() ls /; func" in bash
>> and dash; according to the grammar, dash is correct.)
>>
>> http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_10
>
> No.
>
> Any time "bash is wrong but dash is correct", posix is wrong. Posix is
> saying that the de-facto Linux shell got this wrong for almost 20 years
> and nobody noticed, then a shell that I could trivially segfault when
> Ubuntu first swapped /bin/sh for it, and which "sleep 100 &" and then
> ctrl-c at the prompt would kill the backgrounded sleep... That was doing
> it "right".
>
> No. No it wasn't. Posix was at _best_ irrelevant.
>
>> The relevant productions are:
>>
>> ... For name linebreak in wordlist sequential_sep do_group
>>
>> do_group         : Do compound_list Done           /* Apply rule 6 */
>>
>> compound_list    :              term
>>                  | newline_list term
>>                  |              term separator
>>                  | newline_list term separator
>>
>>
>> compound_list can start with a list of newlines, but it can't start
>> with semicolons.  That's why you can have newlines after "do" but not
>> a semicolon.
>
> I think I'm allowing the semicolon in mine, because there's no obvious
> reason not to.
>
>>> 1) is basically ( as a command (it's a context shift command like if or
>>> while, but it's a command, same block definition as above; see also {
>>> and } blocks).
>>>
>>> 2) happens during environment variable parsing (the _fun_ bit is the
>>> quoting in "$(echo "$(ls -l)")")
>>
>> In my parser, there's nothing special about a command sub surrounded
>> by double quotes surrounded by a command sub surrounded by double
>> quotes.  That's all handled straightforwardly by the recursion (ditto
>> for evaluating the expression).  However, detecting the ) that matches
>> a command sub is not so straightforward, since there are 4 uses of ).
>> It does involve a stack in the lexer; it's debatable whether "context
>> stack" describes it.
>>
>>> Oh, speaking of { } blocks, you can do this on the command line:
>>>
>>>   { echo -e "one\ntwo\nthree"
>>>   } | tac
>>>
>>> But if you don't have the line break in there the } is considered an
>>> argument to echo and you get a prompt for continuation until you feed it
>>> } on the start of a line. You can use a ; instead of a newline though,
>>> that's "start of a line" enough.
>>
>> Right this is because { and } are "reserved words", while ( and ) are
>> operators.  A reserved word has to be delimited by space, whereas an
>> operator delimits itself.  Reserved words are only special if they are
>> the FIRST word, so echo } doesn't need to be quoted, but echo ) does.
>
> I know.
>
>> (echo hi)   # valid without spaces
>> {echo hi}   # not what you think
>> { echo hi }  # not what you think either
>> { echo hi; }  # correct because ; is an operator, and } is the first
>> word in the next command
>
> You're explaining back at me what I said.
>
>>>> There is a similar problem with ${} overloading --
>>>> it's used for anonymous blocks and brace expansion, in addition to var
>>>> expansion.  I found bash bugs here too.
>>>
>>> Such as...?
>>
>> The test case I came up with is:
>>
>> $ echo ${foo:-$({ which ls; })}
>> -bash: syntax error near unexpected token `)'
>>
>> $ dash
>> $ echo ${foo:-$({ which ls; })}
>> /bin/ls
>
> You said they said they regret using yacc as their parser. :)
>
>> This is a command sub with a braced block inside it, as the default
>> value inside ${}.  Bash gets confused about the matching }.  Something
>> like ${foo:-${bar}} should work fine though.
>
> I just checked at echo ${blah:-"$({ ls; })"} works, which isn't hugely
> surprising.
>
>>> Context stack? That was my way. Lots of this parsing needs to nest
>>> arbitrarily deep, and it can cross lines:
>>>
>>>   $ echo ${hello:-
>>>   > there}
>>>   there
>>
>> Right, this is the PS2 problem.  When you hit enter, do you execute
>> the command, or print > and continue parsing?
>
> Eh, not that big a deal. My question was more whether
>
>  $ ls ; echo ${hello:-
>
> Should run the ls before prompting for the rest of the echo.
>
>> Actually this case is broken in dash -- try "echo ${ <newline>" in
>> bash and dash.  (Although I'm sure nobody really cares.)
>
> I don't really care what dash does. It is defective and annoying, says
> so right in the acronym.
>
>>> And if you put a double quote before the $ and after the } you get a
>>> newline before there. If you don't, command line argument parsing and
>>> reblocking strips it.
>>>
>>> What do I mean by reblocking? I mean this:
>>>
>>>   $ printf "one %s three %s\n" ${hello:-two four}
>>>   one two three four
>>
>> I don't see anything special about this; it's a straightforward
>> consequence of word splitting.
>
> Is that what the standard calls it? It's been years since I read through
> the thing from start to finish, terminology gets a bit fuzzy.
>
>> Because there are no quotes around
>> ${hello...}, its value is subject to word splitting, so there are two
>> arguments to printf.
>
> Yes, I know why it does it.
>
>> Quotes change the behavior as you would expect;
>
> You keep thinking I would expect things, but "$@"
>
>> now there is one argument to printf:
>>
>> printf "one %s three %s\n" "${hello:-two four}"
>> one two four three
>>
>> (with the last %s expanding to empty)
>>
>>>> The bash aosabook chapter which I've referred to several times talks
>>>> about how they had to duplicate a lot of the parser to handle this,
>>>> and it still isn't right:
>>>
>>> I'm not looking at bash's implementation, I'm looking at the spec and
>>> what it does when I feed it lots of test cases (what inputs produce what
>>> outputs).
>>
>> You apparently have a love-hate relationship with bash.
>
> It's GNU code widely used by Linux. So yeah.
>
>> You explicitly said you want to write bash and not just sh, yet you don't
>> want to look at how it implements anything :)
>
> I never look at FSF code. On general princples. But the behavior of the
> standard Linux command line is what Linux developers (and the build
> systems they write) expect.
>
>>> Years ago I was trying to get it to preserve NUL bytes in the output of
>>
>>> Toybox doesn't use libc getopt(), we use lib/args.c (which does not use
>>> libc getopt), so what you decide to do in your shell and what it makes
>>> sense for toysh to do may not be related to each other here.
>>
>> Sure, I'm just describing what it does.  I agree getopts is an awkward
>> interface in sh, but if you want a POSIX shell, much less a bash
>> clone, you need it.
>
> Yeah but I might be able to use lib/args.c syntax instead of getopt
> syntax, since my stuff is mostly a superset of their stuff. Haven't dug
> into that todo item yet. Not hugely worried about it either way.
>
>>> Keep in mind, over the years people have written a dozen different
>>> shells. It's really not that big a deal, I just want to do it _right_ so
>>> I'm trying to reserve a large block of time so that once I start I can
>>> finish rather than getting repeatedly interrupted. And that means
>>> knocking down a bunch of smaller todo items first.
>>
>> I definitely agree that you want a big block of uninterrupted time.
>> (I've been off work since March so I've got that going for me.)
>>
>> It's not clear to me that any reasonably popular shell was started
>> later than 1990 or so (is zsh the latest?).  I think the BSDs are
>> using code started 40+ years ago.  I don't know when mksh is from, but
>> I think it must be that old too.
>
> This is why I want a bash replacement. Large existing userbase should be
> able to move over as painlessly as possible. I'm not trying to invent
> significant new syntax here.
>
> A shell is fairly central to the idea of unix, and the default shell of
> Linux has always been bash. (Ubuntu's insanity notwithstanding: the way
> ubuntu admitted its mistake was to make /bin/bash the default _login_
> shell, so it was in all the /etc/passwd entries despite #!/bin/sh
> pointing to something political and useless.)
>
>> As I mentioned, my goal isn't to simply implement sh, because that's
>> been done.  It seems to me that 25 years is a good interval to have
>> some innovation in the shell.  I'm just starting with sh so it's a
>> superset of what is known to work, and so people actually have a
>> migration path.
>
> The same way C is decades old therefore Objective C and C++ and so on
> _must_ be an improvement?
>
> I have seen lots, and lots, and LOTS of new languages fork off of
> existing stuff over the years. Back when I was on fidonet in the 90's
> somebody had collected a list of TWO THOUSAND programming languages,
> which seemed kind of excessive. (I don't still have this list and it
> would be 20+ years out of date anyway, but I remember there was more
> than one language named "oberon".)
>
> At $DAYJOB one of the programmers wrote an openoffice spreadsheet to
> VHDL translation layer in something called leingen, which is a dialect
> of scheme (which is a dialect of lithp) using java virtual machine
> features. This did not seem advisable to me, and yet it exists and
> nobody's had time to rewrite it yet.
>
> Good luck with your project, it is a can of worms I have _zero_ interest in.
>
> Rob
> _______________________________________________
> Toybox mailing list
> Toybox at lists.landley.net
> http://lists.landley.net/listinfo.cgi/toybox-landley.net



-- 
Elliott Hughes - http://who/enh - http://jessies.org/~enh/
Android native code/tools questions? Mail me/drop by/add me as a reviewer.



More information about the Toybox mailing list