[Toybox] Would someone please explain what bash is doing here?

Chet Ramey chet.ramey at case.edu
Sun Mar 8 12:57:21 PDT 2020


On 3/8/20 2:55 PM, Rob Landley wrote:
> On 3/8/20 11:44 AM, Chet Ramey wrote:
>> On 3/8/20 10:53 AM, Rob Landley wrote:
>>>
>>> I read through the posix shell bits long enough ago it was probably SUSv3 rather
>>> than v4, but at the moment I'm taking bash as my standard and just doing
>>> whatever that does. 
>>
>> Well, I appreciate that, but there just might be one or two places (or,
>> depending on who you talk to, one or two hundred) where bash diverges
>> from the standard. That might be because of bugs, or backwards
>> compatibility, or the standard having made a dumb decision.
> 
> Sure. There are a couple places where "bash does a thing" and what I decide to
> do is different, most recent one I hit was:
> 
>   $ for i
>   > in one two three
>   > do echo $i;
>   > done
>   one
>   two
>   three
>   $ for i; in one two three; do echo $i; done
>   bash: syntax error near unexpected token `in'

Yep. This is one place where the grammar as given in the standard makes
newlines (`linebreak') different from semicolons (`sequential_sep'). The
standard requires the bash behavior, for whatever that's worth.

> 
> But in general, if the bash userbase hasn't noticed/minded a posix discrepancy
> (or outright bug) over the past 20 years, I'm not sure why I should care?

It's not necessarily the end users, but rather the vendors who notice and
complain, say, when they try to pass POSIX conformance test suites.

> 
>> And you can sometimes get into trouble for following the standard *too*
>> closely;
> 
> Linux has "echo -e", posix does not. Guess which one toybox implements?

Yeah, POSIX has to deal with the force of history and competing demands on
backwards compatibility. The committee punted on echo entirely, and
recommends printf.

> Meanwhile, all toybox commands support "--", including toybox echo, regardless
> of what the host debian one does consistency won out there. 

Historical inertia and backwards compatibility won in the bash case.

> Ah, here was that specific thread:
> 
>   http://lists.landley.net/pipermail/toybox-landley.net/2018-October/009796.html
> 
> As you can see posix _is_ referenced, but it's not the last word.

Yeah, it always ends up being a priorities question.

> 
>> cf. the issues with bash-5.0 treating an unquoted backslash as
>> subject to being removed by pathname expansion. The heated, lengthy
>> discussion that ensued eventually concluded that the plain text of the
>> standard -- which all agreed was what bash-5.0 implemented -- did not
>> reflect shell implementations or the original intent of the standard
>> developers, and that bash-4.4 implements the right way to do it.
> 
> I haven't implemented pathname expansion yet. IFS corner cases took a couple
> weeks longer than I expected, and I'm still slogging my way through the 8
> gazillion ${stuff} variants.
> 
>> That was not the first occurrence of that phenomenon.
> 
> I'm still subscribed to the posix list, I just don't read it as closely as I
> used to and basically never reply.

Remember the brouhaha (this was at least 15 years ago) about the standard
saying that `set -e' only applied to simple commands and bash having the
audacity to implement what the standard said? Good times.

> 
>>> I should do another pass reading posix afterwards, but after
>>> https://landley.net/notes-2016.html#11-03-2016 I've been much less interested in
>>> interacting with the posix committee due to the risk of another Schilling, and
>>> have pretty much backed up to
>>> https://pubs.opengroup.org/onlinepubs/9699919799.2008edition/ in much the same
>>> way Debian backed up to LSB 4.1 ala https://lwn.net/Articles/658809/
>>
>> I gently recommend that you use the 2018 version of the standard; the group
>> did a lot of good work in those intervening years. That's the version I
>> shoot for.
> 
> I used SUSv2. I upgraded to SUSv3. I upgraded to SUSv4. I'd happily evaluate
> SUSv5, but there isn't one because posix stopped having releases. Instead they
> randomly replace the existing data at the same URL, so if I point people at that
> website I have no idea what'll be there when they look.

There will be an issue 8 at some point, presumably reachable at the same
URL with the same number. I'm not sure about the criteria for updating
the SUS version number, but what the austin group is calling "issue 8"
may be a significant enough change to warrant an SUSv5. So far, the
changes since the last major revision (2008) are consciously limited and
avoid significant behavior changes.


>> I understand about Jorg. I'd like to be able to tell you to just ignore him
>> and listen to other voices, but I get that it's emotionally taxing and his
>> voice is loud enough to drown out others.
> 
> Most projects have... certain individuals. There's talks on that too
> (https://www.youtube.com/watch?v=Q52kFL8zVoM) and I'm told Linus Torvalds
> himself spent a couple months in therapy last year.
> 
> But in this case, in a public thread, nobody else spoke up with a different
> view. His voice was the ONLY voice. So I stopped listening.

I -- speaking only for myself -- think everyone views that as Jorg just
being Jorg, as it were, and tunes that out.

> 
>>> I still _sort_ of care about newer posix, but I got {bracket,expansion} working
>>> last year 
>>
>> The group has discussed brace expansion. It's more or less a valid
>> extension not described by the standard.
> 
> The failure mode of posix is the absence of stuff, to the point you can't boot a
> posix-only system (no init, no mount, I always assumed microsoft and IBM's need
> to pass FIPS 151-2 back in the day led to signing large checks to open holes big
> enough to drive NT and OS/360 through).

I think that's an explicit choice for exactly the reason you note. POSIX
is defined in such a way to make it possible to write scripts and assume a
certain behavior from the  utilities it chooses to standardize. There was
never an inclination to specify a full system like that, since it would
have precluded implementing POSIX on systems that are not Unix.


>>> and last month taught my $IFS splitting to understand utf8 characters
>>
>> ? I don't think there's anything in POSIX that restricts IFS to single-
>> byte characters, since everywhere it refers to a "character" it's supposed
>> to be understood that a character can consist of multiple bytes. The
>> standard defines the term that way.
> 
> The bash man page defines "IFS whitespace" as different from unicode whitespace.
> (Space, tab, and newline only. Mine will in theory take the non-blank oggham
> whitespace, although I haven't added that to tests/sh.test yet. :)

Bash will, too. If you want to put a non-breaking space into $IFS, it will
be happy to split words on it. The business about "IFS whitespace" being
space/tab/newline is to reconcile differences between historical behaviors
that date back to an ASCII-only world. You have to live with those.


> Posix is in there, but what the linux command line in my host distro does is at
> least as important. 

How do you reconcile the differences when bash and dash (as /bin/sh) do
different things? Dash is most definitely a posix-and-little-else shell.

Chet

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
		 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU    chet at case.edu    http://tiswww.cwru.edu/~chet/



More information about the Toybox mailing list