[Toybox] Would someone please explain what bash is doing here?

Wed May 27 12:31:16 PDT 2020

On Sat, May 23, 2020 at 8:19 PM Rob Landley <rob at landley.net> wrote:
>
> On 5/23/20 5:51 PM, Chet Ramey wrote:
> > On 5/23/20 1:11 PM, Rob Landley wrote:
> >> Starting to open the job control can of worms, and:
> >>
> >>   $ while true; do readlink /proc/self | cat - $$; done
> >>   24658
> >>   cat: 20032: No such file or directory
> >>   24660
> >>   cat: 20032: No such file or directory
> >>   24662
> >>
> >> Is calling readlink and cat each time through the loop (true is a builtin), so
> >> the pid advances by 2 and the pipeline is NOT a subshell.
> >
> > Correct. Each element of a pipeline is executed in a subshell.
>
> Hmmm.
>
> >> But:
> >>
> >>   $ echo hello | read i; echo $i
> >>
> >> The read isn't saved because it's happening in a subshell context (so it sets an
> >> i that is discarded)?
> >
> > Correct. Since the read is executed in a subshell, it can't affect its
> > parent's environment.
>
> It's actually easier for me _not_ to do that (because nommu support), but oh well.
>
> >> And then:
> >>
> >>   $ while true; do continue | readlink /proc/self; done
> >>   28555
> >>   28557
> >>   28559
> >>   28561
> >>
> >> Is advancing the pid by 2 each time, because the _continue_ is in its own process?
> >
> > Each element of a pipeline is run in a subshell. That's how you can set its
> > process group and get it to respond to job control signals sent to the
> > terminal's process group.
>
>   $ read i
>   ^Z^Z^Z^Z^Z^Z^Z^Z^Z
>
> Why are pipelines different?
>
> > POSIX says you can run any element of a pipeline
> > in the current shell context, but in practice nobody does that for any one
> > but the last, and bash only does it if `lastpipe' is set.
> >> It's truly a huge PITA to run the last element of the pipeline in the
> > current shell context when job control is enabled, keeping track of
> > process groups, handling signals like SIGTSTP, and forking at the right
> > time so you can suspend yourself if you need to. I've never been tempted.
> > I don't know how much trouble it was for Korn, but the zsh guys literally
> > fought bugs in that code for years.
>
> Do they have a regression test suite? I'd love to harvest test cases...
>
> Yes they do, and it has a README. Hmmm...

(remember that mksh has a pretty extensive test suite too.)

> >>   $ while true; do continue | cat; echo hello; done
> >>   hello
> >>   hello
> >>   hello
> >>
> >>   $ while true; do break | cat; echo hello; done
> >>   hello
> >>   hello
> >>   hello
> >>
> >> continue and break are silently NOP in a pipe?
> >
> > What are they supposed to do? They can't affect the parent. All they can
> > do is complain, which would be annoying.
>
> This does:
>
>   for i in a b c d e & do echo $i; done
>
> *shrug* I just expected it to be consistent.
>
> >> Also, just confirming: $$ only shows the PID of the top level bash process, and
> >> there's no variable that shows the PID of (subshells) even though the point of a
> >> subshell is to spawn a new process?
> >
> > There is $BASHPID.
>
> Huh, I grepped for declare -p output with a pid in range...
>
>   $ declare -p | grep BASHPID
>   declare -ir BASHPID
>
> Ah, that would explain why.
>
>   $ echo $BASHPID
>   25545
>   $ (declare -p | grep BASHPID)
>   declare -ir BASHPID="25545"
>
> It's another one of those magic variables that's assigned to by resolving it,
> and then keeps its last value.
>
>   $ declare -p SECONDS
>   declare -i SECONDS="115"
>   $ declare -p SECONDS
>   declare -i SECONDS="117"
>   $ declare -p SECONDS
>   declare -i SECONDS="118"
>
> Anyway, good to know. Thanks.
>
> >> P.S. this is old, but:
> >>
> >>   $ for i in a b c & do echo $i; done
> >>   bash: syntax error near unexpected token `&'
> >>
> >> But break & is fine? What does that even _mean_?
> >
> > Come on. You at least have to implement the difference between a
> > `wordlist', which is a list of shell WORDs, and a command list, which
> > is terminated by the *operator* `&'.
>
> ...no?
>
> I read lines from input. (Haven't implemented interactive command history
> editing yet, it's a todo.)
>
> There's parse_word() which finds the end of the current word, and figures out
> when you need to ask for continuations due to unterminated quoting, which
> includes $() and friends. And yes it handles "$("echo $('ls') )")" and so on.
>
> Then there's a parse_line() which figures out when to ask for line continuations
> due to flow control. (if/fi do/while which includes () and {}, and also trailing
> flow control ala && || and also HERE documents... The function returns
> hit/pass/bust to the caller in sh_main(), which does the $PS1 prompting and
> feeds it more lines, or calls run_function().)
>
> When it's got a complete thought, it calls run_function() on the parsed block
> structure returned by parse_line(), and run_function() traverses the flow
> control and calls run_command() which calls expand_redir() to get a argc/argv[]
> pair with all the variables expanded and all the redirections performed (with an
> unredir list you traverse to put them _back_, the original filehandles are duped
> up above 10 where {blah}<abc and friends meddle anyway, and that's so nommu
> doesn't handle any file access error cases after vfork()...)
>
> I should writeup a walkthrough of all this when it's done, but it's still in
> flux a bit as I hit each new "no, that doesn't work" and reshuffle stuff.
>
> By the way, did I already ask why {var}<file only works on block context and not
> on a command?
>
>   $ export abc=potato
>   landley at driftwood:~/toybox/toybox$ env {abc}</dev/null | grep abc
>   abc=potato
>
> I mean, it's CHECKING the file:
>
>   $ env {abc}</missing | grep abc
>   bash: /missing: No such file or directory
>
> But it's closing the filehandle without doing anything with it?
>
>   $ env {abc}>/dev/null
>   # abc=potato in here but it's a long list
>   $ ls /proc/self/fd
>   0  1  2  3
>
> (I just made mine work in both contexts, I think? It was easier...)
>
> Huh:
>
>   $ exec {abc}>/dev/null
>   $ echo $abc
>   10
>   $ ls /proc/self/fd
>   0  1  10  2  3
>
> Ok, THAT works. The redirect neither sets the variable for a command, nor keeps
> the redirect after the command... I guess exec and block end redirect logic are
> a similar codepath?
>
> > Unquoted `&' is always an operator, it is never a WORD, and so it can't
> > appear in a list of WORDs, which is what follows `in'.
>
> Every command is terminated with either end of line or one of:
>
>       // Flow control characters that end pipeline segments
>       s = end + anystart(end, (char *[]){";;&", ";;", ";&", ";", "||",
>         "|&", "|", "&&", "&", "(", ")", 0});
>
> The word parsing logic returns either the end of next word within the string, or
> a null * to mean "unterminated quote" (which includes \ at the end of line). The
> loop calling parse_word() in parse_line() figures out how to assemble those
> words into blocks (and can return to its own caller asking for additional
> continuations because of unfinished blocks and here documents and trailing flow
> control characters).
>
> parse_line() has an "expect" stack which is the word (at start of statement)
> that terminates the current. The parsing also knows that ( and ) start a new
> line, I.E. they're commands _and_ flow control characters, and yes it has to
> check for (( and )) but getting this right:
>
>   ((echo hello) | cat)
>   $((echo hello) | cat)
>
> took some doing and it has to retroactively break the (( ...
>
> (I cheated slightly: I use a 4k buffer to store the parentheses stack, and if
> that overflows with 4096 nested parentheses it's "syntax error: tilt".).
>
> Anyway, the parse_line plumbing records the flow control character that
> terminated the line in arg->v[arg->c] (with newline or semicolon being saved as
> NULL there), but some places (such as the end of the in list) can't have a
> non-null terminator because that's a syntax error. So those places check for that.
>
> parse_line() checks for a bunch of several syntax errors: << with no label,
> "function(" without ")" or next word isn't a { ... and yes _word_:
>
>   $ function(){echo potato;}
>   bash: syntax error near unexpected token `('
>
> What else... ;; outside case, flow control without a statement, and a bunch of
> "for" cases (for on its on line, for i X where the X isn't in, ((, or do, more
> than one line after "in" without a do...) and so on.
>
> Afterwards run_function() mostly assumes the syntax of whatever it's dealing
> with is correct and doesn't re-check it, but "continue" and "break" are special.
> (They're normal type 0 commands but they modify flow control local to
> run_function() instead of going through run_command()...)
>
> Anyway, from _my_ perspective "continue | thingy" and "if true | then" seem
> equally weird, but I guess not to yacc/bison. One is a syntax error, the other a
> silent NOP.
>
> > Chet
>
> Rob