[Toybox] Would someone please explain what bash is doing here?
Rob Landley
rob at landley.net
Wed Mar 11 20:55:13 PDT 2020
On 3/11/20 9:41 PM, James McMechan wrote:
>
>
> On Sun, Mar 8, 2020, 12:29 Rob Landley <rob at landley.net
> <mailto:rob at landley.net>> wrote:
>
> On 3/8/20 11:44 AM, Chet Ramey wrote:
> > On 3/8/20 10:53 AM, Rob Landley wrote:
> >>
> >> I read through the posix shell bits long enough ago it was probably SUSv3
> rather
> >> than v4, but at the moment I'm taking bash as my standard and just doing
> >> whatever that does.
> >
> > Well, I appreciate that, but there just might be one or two places (or,
> > depending on who you talk to, one or two hundred) where bash diverges
> > from the standard. That might be because of bugs, or backwards
> > compatibility, or the standard having made a dumb decision.
>
> Sure. There are a couple places where "bash does a thing" and what I decide to
> do is different, most recent one I hit was:
>
> $ for i
> > in one two three
> > do echo $i;
> > done
> one
> two
> three
> $ for i; in one two three; do echo $i; done
> bash: syntax error near unexpected token `in'
>
> But in general, if the bash userbase hasn't noticed/minded a posix discrepancy
> (or outright bug) over the past 20 years, I'm not sure why I should care?
>
> > And you can sometimes get into trouble for following the standard *too*
> > closely;
>
> Linux has "echo -e", posix does not. Guess which one toybox implements?
>
> Meanwhile, all toybox commands support "--", including toybox echo, regardless
> of what the host debian one does consistency won out there. Although I
> compromised by doing the xargs-style behavior where option parsing ends with the
> first non-option argument so "echo -- hello" prints hello but "echo hello --"
> prints "hello --". And yes when Elliot found out you can do "ls hello -l" and
> the whole can of worms about "rm *" expanding to -r and such, he suggested all
> toybox commands should do that, but I stayed with the more-standard behavior of
>
> letting you "ls subdir -l". That discussions on the toybox list somewhere. We
> have threads about this sort of corner case all the time, ala:
>
> http://lists.landley.net/pipermail/toybox-landley.net/2017-March/008888.html
>
>
> Well some of that is the glob() function or maybe wordexp(). My thought was to
> make it so that when glob() hit a file named "-rf" to expand it would expand it
> to "./-rf" to prevent people from being "too clever by half" also making it so
> the simple ".*" would not expand to either "." or ".." I don't recall a glob in
> toybox but it is usually part of the C library or shell.
It is part of the C library, but I could wrap it in lib.
The bash man page says:
There are seven kinds of expansion performed: brace expansion,
tilde expansion, parameter and variable expansion, command substitu‐
tion, arithmetic expansion, word splitting, and pathname expansion.
Except in reality there's sort of eight (quote removal, which << EOF does by
itself, but here is done as part of variable expansion).
I've implemented MOST of brace expansion (need to go back and do {1..2..3},
tilde expansion, about half of variable expansion (next up the ${blah} can of
worms with ${blah/a/b} and ${#blah} and ${blah:-x} which is SLIGHTLY DIFFERENT
than ${blah-x}...). I've done command substitution but need more debugging
scrutiny on it (it worked, then I broke it, then it worked again...), haven't
started arithmetic expansion yet (gotta write one of them operand stack +
operator stack thingies with priorities, which is fun because it writes to
variables that don't exist yet AND recursively resolves stuff:
$ fruit=basket
$ potato=fruit
$ basket=42
$ echo $((potato))
42
And yes:
$ x=x
$ echo $((x))
bash: x: expression recursion level exceeded (error token is "x")
Word splitting isn't really a seperate step, exactly? "echo one two three"
doesn't care about $IFS, and the output of any of the OTHER substitutions:
$ IFS=3; echo $((1224+10))
12 4
And then pathname expansion. Haven't started it yet. I have dirtree and friends,
and I wrote my own globbing logic from scratch once back in the 90's. Not too
worried about it, just... a bit to chew between here and there.
> A few years back David Wheeler proposed limiting the characters in filenames
> https://lwn.net/Articles/686789/ in attempt to fix the issue by whitelisting
> valid first, middle, and ending characters.
Oh please no. Linus was right:
https://yarchive.net/comp/linux/utf8.html
I'm aware case sensitivity filters went in at some point and it was for the
Samba guys dealing with NT filesystem semantics on directories that are ALSO
accessed locally (such as the same dir exported via SMB and NFS) which is
unavoidably racy without kernel support.
But seriously, "I plug in a USB stick with a file that does NOT conform to your
filter, and now I can't access it" is a thing. "And once you start down the dark
path, forever will it dominate your destiny, consume you it will. Still cannot
we get George editing the old movies to stop, worse and worse he makes them, and
to Disney YEARS ago the franchise he sold, honestly, needs to stop he does. You
start, do not."
(All, of course, read in Yoda's voice.)
> I thought that fixing glob() so it
> is less surprising was a better answer.
Or wrapping it. But really, it's "fed back into command line and re-parsed" that
causes problems. Something like dirtree isn't going to care.
> I have had stupid characters in file names and if you have not used the "./-"
> construct it is hard to get rid of them.
Yes, but that's why that construct exists. (And again, toybox supports --
everywhere.)
> Also <Tab key> expansion of arguments should use the same or similar logic... so
> "rm -<Tab>" -> "rm ./-" if there is a file "./-" would help.
Command line editing and job control remain as of yet unopened cans of worms. I
have plans, but Not Going There Yet.
Rob
More information about the Toybox
mailing list