[Toybox] Would someone please explain what bash is doing here?

Rob Landley rob at landley.net
Sun Mar 8 11:55:17 PDT 2020


On 3/8/20 11:44 AM, Chet Ramey wrote:
> On 3/8/20 10:53 AM, Rob Landley wrote:
>>
>> I read through the posix shell bits long enough ago it was probably SUSv3 rather
>> than v4, but at the moment I'm taking bash as my standard and just doing
>> whatever that does. 
> 
> Well, I appreciate that, but there just might be one or two places (or,
> depending on who you talk to, one or two hundred) where bash diverges
> from the standard. That might be because of bugs, or backwards
> compatibility, or the standard having made a dumb decision.

Sure. There are a couple places where "bash does a thing" and what I decide to
do is different, most recent one I hit was:

  $ for i
  > in one two three
  > do echo $i;
  > done
  one
  two
  three
  $ for i; in one two three; do echo $i; done
  bash: syntax error near unexpected token `in'

But in general, if the bash userbase hasn't noticed/minded a posix discrepancy
(or outright bug) over the past 20 years, I'm not sure why I should care?

> And you can sometimes get into trouble for following the standard *too*
> closely;

Linux has "echo -e", posix does not. Guess which one toybox implements?

Meanwhile, all toybox commands support "--", including toybox echo, regardless
of what the host debian one does consistency won out there. Although I
compromised by doing the xargs-style behavior where option parsing ends with the
first non-option argument so "echo -- hello" prints hello but "echo hello --"
prints "hello --". And yes when Elliot found out you can do "ls hello -l" and
the whole can of worms about "rm *" expanding to -r and such, he suggested all
toybox commands should do that, but I stayed with the more-standard behavior of
letting you "ls subdir -l". That discussions on the toybox list somewhere. We
have threads about this sort of corner case all the time, ala:

  http://lists.landley.net/pipermail/toybox-landley.net/2017-March/008888.html

Ah, here was that specific thread:

  http://lists.landley.net/pipermail/toybox-landley.net/2018-October/009796.html

As you can see posix _is_ referenced, but it's not the last word.

> cf. the issues with bash-5.0 treating an unquoted backslash as
> subject to being removed by pathname expansion. The heated, lengthy
> discussion that ensued eventually concluded that the plain text of the
> standard -- which all agreed was what bash-5.0 implemented -- did not
> reflect shell implementations or the original intent of the standard
> developers, and that bash-4.4 implements the right way to do it.

I haven't implemented pathname expansion yet. IFS corner cases took a couple
weeks longer than I expected, and I'm still slogging my way through the 8
gazillion ${stuff} variants.

> That was not the first occurrence of that phenomenon.

I'm still subscribed to the posix list, I just don't read it as closely as I
used to and basically never reply.

>> I should do another pass reading posix afterwards, but after
>> https://landley.net/notes-2016.html#11-03-2016 I've been much less interested in
>> interacting with the posix committee due to the risk of another Schilling, and
>> have pretty much backed up to
>> https://pubs.opengroup.org/onlinepubs/9699919799.2008edition/ in much the same
>> way Debian backed up to LSB 4.1 ala https://lwn.net/Articles/658809/
> 
> I gently recommend that you use the 2018 version of the standard; the group
> did a lot of good work in those intervening years. That's the version I
> shoot for.

I used SUSv2. I upgraded to SUSv3. I upgraded to SUSv4. I'd happily evaluate
SUSv5, but there isn't one because posix stopped having releases. Instead they
randomly replace the existing data at the same URL, so if I point people at that
website I have no idea what'll be there when they look.

Why toybox does NOT do that kind of "continuous integration" without releases is
one of the few toybox FAQ entries I actually got written up and posted to the
website:

  https://landley.net/toybox/faq.html#releases

(I was going to say "finished" there, but another argument in favor of releases
I didn't mention there is the "heartbeat" role. Just re-certifying that what we
have is still current and still maintained is valuable, even if the changes are
just a couple typos. Bumping the release schedule down to something less
frequent makes sense for a less-active project, but SUSv4 came out a full 10
years ago and what you have up is still "issue 7" at the same URL, despite
having replaced it at least twice.)

Previous posix releases had different URLs. If the posix list decided
https://pubs.opengroup.org/onlinepubs/9699919799/utilities/c99.html is too old
and instead of going back to the "cc" everybody actually uses, instead
renumbered it to standard du jour (c18 apparently), nobody would even know c99
had _been_ in SUSv4 unless they knew the magic stable URL. (Thank you for having
one, by the way.) And no, that's not an academic concern: in the case of "tar"
and "cpio", being able to pull up the old standard you dropped as a frame of
reference for things was nice, and both commands cite the relevant old posix
spec in the comments at the top. (Nobody uses "pax", and cpio -H newc is the
basis for RPM and initramfs. We've been discussing teaching it about xattrs on
the kernel list on and off for years now.)

Let me know if posix ever cuts a new release.

> I understand about Jorg. I'd like to be able to tell you to just ignore him
> and listen to other voices, but I get that it's emotionally taxing and his
> voice is loud enough to drown out others.

Most projects have... certain individuals. There's talks on that too
(https://www.youtube.com/watch?v=Q52kFL8zVoM) and I'm told Linus Torvalds
himself spent a couple months in therapy last year.

But in this case, in a public thread, nobody else spoke up with a different
view. His voice was the ONLY voice. So I stopped listening.

>> I still _sort_ of care about newer posix, but I got {bracket,expansion} working
>> last year 
> 
> The group has discussed brace expansion. It's more or less a valid
> extension not described by the standard.

The failure mode of posix is the absence of stuff, to the point you can't boot a
posix-only system (no init, no mount, I always assumed microsoft and IBM's need
to pass FIPS 151-2 back in the day led to signing large checks to open holes big
enough to drive NT and OS/360 through).

I view it as a frame of reference to diverge from, and that's fine. It's still
more useful than LSB. (Possibly less so than man7.org. Yes he has releases,
they're at https://mirrors.edge.kernel.org/pub/linux/docs/man-pages/Archive/)

>> and last month taught my $IFS splitting to understand utf8 characters
> 
> ? I don't think there's anything in POSIX that restricts IFS to single-
> byte characters, since everywhere it refers to a "character" it's supposed
> to be understood that a character can consist of multiple bytes. The
> standard defines the term that way.

The bash man page defines "IFS whitespace" as different from unicode whitespace.
(Space, tab, and newline only. Mine will in theory take the non-blank oggham
whitespace, although I haven't added that to tests/sh.test yet. :)

No idea what posix says about it, the last time I read the whole posix shell
section end to end was... my blog says 2007. (I've triaged the command line
utilities at length a lot more recently, for
http://landley.net/toybox/roadmap.html#susv4 and
https://landley.net/toybox/status.html . Including checking the 2013 version to
see if anything interesting seemed to have changed, in the before-Jorg times.)

I am scrutinizing All The Behavioral Corner Cases in the world, but then I
always do when I write a new command, just like
https://landley.net/notes-2012.html#15-05-2012 and
https://landley.net/notes-2012.html#13-04-2012 from forever ago.

Posix is in there, but what the linux command line in my host distro does is at
least as important. Half the time I look at posix it's because I'm trying to
figure out what I might be able to get away with _excluding_. When I _first_
started thinking about doing a proper shell (back when I was maintaining busybox
and it had 4 shells), I started by printing the bash man page into a three ring
binder:

  https://landley.net/notes-2006.html#24-08-2006

>> (and have a TODO item that if IFS is an array it should understand strings), and
>> I honestly don't expect to live long enough for either NOT to be a divergence
>> from Posix.
> 
> I don't see POSIX ever standardizing arrays, and no conforming application
> will ever expect IFS to be an array, so as long as you DTRT when IFS is a
> string variable, you should be free to do whatever you like.

I often try to document "deviations from posix" at the top of each command. I
have a todo item as part of the eventual 1.0 release cleanup to go through and
update all those sections as part of updating the test suite for Full Coverage.
But that's a year's worth of work all by itself, and I don't get to work on
toybox full time...

Rob



More information about the Toybox mailing list