[Toybox] Would someone please explain what bash is doing here?

Chet Ramey chet.ramey at case.edu
Mon May 11 13:55:37 PDT 2020


On 5/10/20 7:24 PM, Rob Landley wrote:

>>>   $ echo \
>>>   > $LINENO
>>>   2
>>>
>>>   $ echo $LINENO \
>>>   $LINENO
>>>   1 1
>>
>> Let's look at these two. This is one of the consequences of using a parser
>> generator, which builds commands from the bottom up (This Is The House That
>> yacc Built).
> 
> I've never understood the point of yacc. I maintained a tinycc fork for 3 years
> because it was a compiler built in a way that made sense to me.

It saves you having to write a lot of code if you have a good grammar to
work from. One of the greatest achievements of the original POSIX working
group was to create a grammar for the shell language that was close to
being implementable with a generator.

>> You set $LINENO at execution time based on a saved line number in the
>> command struct, and that line number gets set when the parser knows that
>> it's parsing a simple command and begins to construct a struct describing
>> it for later execution.
>>
>> In all cases, the shell reads a line at a time from wherever it's reading
>> input. In the first case, it reads
>>
>> "echo \"
>>
>> and starts handing tokens to the parser. After getting `echo', the parser
>> doesn't have enough input to determine whether or not the word begins a
>> simple command or a function definition,
> 
> The tricky bit is "echo hello; if" does not print hello before prompting for the
> next line, 

Yes. You're parsing a command list, because that's what the `;' introduces,
and the rhs of that list isn't complete. A "complete_command" can only be
terminated by a newline list or EOF.

> and "while read i; echo $i; done" resolves $i differently every way,

You mean every time through the loop?

> which said to _me_ that the parsing order of operations is
> 
> A) keep parsing lines of data until you do NOT need another line.
> 
> B) then do what the lines say to do.

Roughly, if you mean "complete commands have been resolved with a proper
terminator" and "execute the commands you just parsed."

> I have parse_word() that finds the end of the next word (returning NULL if we
> need another line to finish a quote, with trailing \ counting as a quote), and
> parse_line() that adds words to struct sh_arg {int c; char **v;} in a linked
> list of struct sh_pipeline in a struct sh_function, and when parse_line() does
> NOT return a request for another line, the caller can run_function() and then
> free_function().

Have you found that structure enough to handle, say, if-then-elif-else-fi
and the various flavors of the `for' command?


>> and goes back for more input. The
>> lexer sees there are no tokens left on its current input line, notes that
>> line ends in backslash and reads another line, incrementing the line
>> number, throws away the newline because the previous line ended in
>> backslash,
> 
> I _was_ throwing away the newline, but I stopped because of this. Now I'm
> keeping it but treating it as whitespace like spaces and tabs, but that's wrong:

It is wrong; it needs to be removed completely.


>> and returns $LINENO. The parser finally has enough input to
>> reduce to a simple command, and builds one, with the line number set to 2.
> Ok, so line number is _ending_ line, not starting line. (Continuations increment
> the LINENO counter _before_ recording it for the whole span.)

Not necessarily the ending line; a simple command can span an arbitrary
number of lines, but $LINENO gets set from whatever line the lexer was on
when the parser recognized what it had as a simple command. It can then
continue reading words in that simple command until it gets the unescaped
newline (or `;', or `&', or any of the command separators) to terminate it.

If you want to look ahead far enough, you can save the line number if you
don't read a reserved word, peek at the next few characters to see if you
get '()', and build a simple command using the saved line number. Yacc/
Bison don't let you look that far ahead.


>>>>>>> I currently have no IDEA what "sh --help" should look like when I'm done, 
>>>>>>
>>>>>> I'm pretty sure bash --help complies with whatever GNU coding standards
>>>>>> cover that option.
>>>>>
>>>>> Currently 2/3 of bash --help lists the longopts, one per line, without saying
>>>>> what they do. So yeah, that sounds like the GNU coding standards.
>>
>> Oh, please. It doesn't describe what each single-character option does,
>> either. That's a job for a man page or texinfo manual.
> 
> Then why are they one per line?

Because it's not worth the effort to space them across the screen.

>>> Except you've got some parsing subtlety in there I don't, namely:
>>>
>>>   $ bash -hc 'echo $0' --norc
>>>   --norc
>>>
>>>   $ bash -h --norc -c 'echo $0'
>>>   bash: --: invalid option
>>
>> "Bash also  interprets  a number of multi-character options.  These op-
>>  tions must appear on the command line before the  single-character  op-
>>  tions to be recognized."
>>
>> Bash has always behaved this way, back to the pre-release alpha and beta
>> versions, and I've never been inclined to change it.
> 
> Indeed. Unfortunately for _my_ code to do that it would have to get
> significantly bigger, because I'd need to stop using the generic command line
> option parsing and pass them through to sh_main() to do it myself there. (Or add
> intrusive special cases to the generic parsing codepath.)

You can probably get away with it as long as that option parsing code stops
at the first word that doesn't begin with `-'.

> 
> Documenting this as a deviance from <strike>posix</strike> the bash man page
> seems the better call in this instance. 

Documenting what as a deviation? POSIX doesn't do long options; you can do
whatever you like with them.


>>>   $ bash -cs 'echo $0'
>>>   bash
>>
>> This is ambiguous, but not in the way you expect. The thing that differs
>> between shells is whether or not they read input from stdin (because of
>> the -s option) after executing the `echo $0'. POSIX specifies them as
>> separate cases, so nobody should expect anything in particular when they
>> are combined. The ash-derived shells start reading from standard input,
>> bash and the ksh-like shells exit after executing the echo, and yash
>> rejects the option combination entirely.
> 
> Wheee.
> 
> In my case "how options work in all the other toybox commands" puts a heavy
> weight on one side of the scales. (Not insurmountable, but even the exceptions
> should have patterns.)

The Bourne shell option parsing long predates modernities like getopt(), so
the basic rule is "scan for words starting with `-' or `+', parse them as
binary flag options, handling `--' in some reasonable way to end option
parsing, then grab what you need from the argument list (the command for
-c), and use everything else to set the positional parameters. Oh, and use
the same code for `set', so you have to reject the options that are only
valid at invocation.

> 
>>> But again, you have to conform to the gnu style guidelines, which I thought
>>> means you'd have a texinfo page instead of a man page?
>>
>> I have both.
> 
> sed and sort had both but treated the man page as an afterthought. Many of their
> gnu extensions were ONLY documented in the info page when I was writing new
> implementations for busybox back in the day. (No idea about now, haven't looked
> recently. 

OK. The bash man page and texinfo manual have the same content.

> These days I handle that sort of thing by waiting for somebody to
> complain. That way I only add missing features somebody somewhere actually _uses_.)

It has to be a lot more than one person.

> 
> For toysh, I've taken a hybrid approach. I'm _reading_ every man page corner
> case and trying to evaluate it: for example /dev/fd can be a filesystem symlink
> to /proc/self/fd so isn't toysh's problem, but I'm making <(blah) resolve to
> /proc/self/fd/%d so it doesn't _require_ you to have the symlink. 

Yeah, you only have to worry about linux.

>> I abandoned the -o namespace to POSIX a
>> long time ago, and there is still an effort to standardize pipefail as
>> `-o pipefail', so I'm leaving it there. I originally made it a -o option so
>> we could try and standardize it, and that reasoning is still relevant.
> 
> It's a pity posix is moribund.

It's not dead, just slow.

https://www.austingroupbugs.net/view.php?id=789

So we started talking about this in some official proposed way in 2013,
continued sporadically until 2018, decided on some official text to add
to the standard in September, 2018, and it will be in the next major
revision of Posix, issue 8.

 I mentioned the fact pipefail had been picked up
> by multiple other shells in the toybox talk I did 7 years ago:

Bash wasn't the first shell to have it. I didn't add it until 2003, after
a discussion with David Korn (the aforementioned Posix standardization
effort).

>>> ----------
>>> Usage: sh [--LONG] [-ilrsD] [-abefhkmnptuvxBCHP] [-c CMD] [-O OPT] [SCRIPT] ...

You're missing `-o option'.


>>> Do you really need to document --help in the --help text? 
>>
>> Why not? It's one of the valid long options.
> 
> Lack of space. I was trying to squeeze one less line out of the output. :)

Believe me, if it were not there, someone would complain about its absence.
There are a few things that remain undocumented, and I get crap about them
as regular as clockwork.


>> The bash man page does
>>> not include the string "--debug" (it has --debugger but not --debug), 
>>
>> It's just shorthand for the benefit of bashdb.
> 
>   $ help bashdb
>   bash: help: no help topics match `bashdb'.  Try `help help' or `man -k bashdb'
>   or `info bashdb'.
>   $ man -k bashdb
>   bashdb: nothing appropriate.
>   $ man bash | grep bashdb
>   $
> 
> google... huh, it's a sourceforge package.

It's the source of most of the bash debugging support. Rocky started out
distributing patches, but I folded most of the features into the mainline
source. It's a nifty little bit of work.

> 
> I'm not sure how I'd have divined the existence of the sourceforge package from
> the --debug option in the help output (which didn't make an obvious behavior
> difference when I tried it), but I often miss things...

The debugging support exists independently of bashdb, and can be used
without it. Bashdb is just the biggest customer, and the origin of the
features. The `debugger profile' in the documentation is the bashdb
driver script. Try running `bash --debugger' sometime, you might like it.
Assuming, of course, your vendor has installed bashdb in the right place.

> -D	Display all the $"translatable" strings in a script.
> 
> Oh right, I remember reading about $"" existing and going "that's weird, out of
> scope" and moving on. Because I did _not_ understand how:
> 
>        A double-quoted string preceded by a dollar sign ($"string") will cause
>        the string to be translated according to the current  locale.   If  the
>        current  locale  is  C  or  POSIX,  the dollar sign is ignored.  If the
>        string is translated and replaced, the replacement is double-quoted.
> 
> was supposed to work. (HOW is it translated? Bash calls out to
> translate.google.com to convert english to japanese? Is there a thing humans can
> do to supply an external translation file? Is it just converting dates and
> currency markers and number grouping commas?)

You have a message catalog, install the right files, and use the gnu
gettext infrastructure to get translated versions of the strings you
mark. It's a shell script way of doing what bash does internally for its
own messages. Very little-used.


> Ah, gettext. That would explain why I don't know about it. I always used
> http://penma.de/code/gettext-stub/ in my automated Linux From Scratch test
> builds because it's one of those gnu-isms like info and libtool.

It will be in Posix one day.

Chet
-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
		 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU    chet at case.edu    http://tiswww.cwru.edu/~chet/



More information about the Toybox mailing list