[Toybox] bash continues to confuse me.

Fri Jul 3 00:15:09 PDT 2020

On 7/1/20 3:50 PM, Chet Ramey wrote:
> On 7/1/20 7:28 AM, Rob Landley wrote:
> 
>>> Then the `why' as `because that's a choice Bourne made in 1978' should
>>> suffice, right?
>>
>> I'm not really trying to figure out why the behavior was chosen, that seems
>> arbitrary and historic with archaeological layers.
>>
>> I'm trying to figure out the minimum set of rules to capture the necessary
>> behavior, and whether outliers from those rules are important or coincidental
>> and possibly ignorable.
> 
> It seems like a clear rule is that a word expansion in braces beginning
> with `#' finds the close brace and uses everything between the `#' and the
> close brace as a parameter name, and expands to its length.

Only if you define "word expansion" as having a nonzero amount of variable name
after the #:

  $ echo ${#/0/zero}
  zero

Which is odd because if it _does_ find one character of possible variable name
it eats all the characters leaving no suffix:

  $ echo ${#PATH/0/zero}
  bash: ${#PATH/0/zero}: bad substitution

But it's defining "variable name" to include the special variable names:

  landley at driftwood:~/www$ echo $$
  13781
  $ echo ${#$}
  5

So it has to know when it has a variable name and when it doesn't (including
special variable names), but _not_ know when that variable name ends so it can
process leftover chars ala ${#@@Q} thus quoting '0' because #@ undergoes @Q.

> POSIX says pretty much exactly that.

Backslash is funky:

  $ echo ${#PATH\}
  >

The line continuation parsing logic knows about \ within ${} but only SOME of
the ${} contents parsing logic (contextually) does.

My current understanding is that the ${XXX} plumbing parses the XXX in a
specific sequence: tt's basically breaking the line down into what I've been
calling "prefix", "name", and "slice" sections.

1) Grab prefix characters (# or !), if a prefix is found parse the string with
custom code for the prefix type to resolve a value. None of the prefix parsing
logic understands \escapes, but it _can_ leave a slice section to handle later,
and that later slice logic handling that section may.

2) Else grab variable name (including special names ala $$ and $*) and resolve
it, record where variable name ends and what's left over is the slice operator.
This is an "else" because if a prefix was found it skips this part. The variable
name cannot have an \escape in it because the \ terminates the variable name
processing:

  $ echo ${\!}
  bash: ${\!}: bad substitution
  $ echo ${\a}
  bash: ${\a}: bad substitution

Both 1) and 2) above resolve what the found to a value, which can be either a
string or an array of strings. (Currently $@ $* and ${!abc@} are giving me
arrays of values, I haven't implemented array variable types yet because I've
never used them, but have a bunch of TODO notes where I need to go back and slot
them in.)

It then falls through to the $IFS handling logic on the newly resolved data, and
if there's a leftover "slice" that transformation is done to the resolved data
before the $IFS stuff happens. This is a loop because it can glue together
arrays (or operate on multiple array elements), but for non-array strings the
loop exits after the first iteration.

If the slice operator isn't NULL or "}" then there's an if/else staircase of
operator types, starting with the ":-" vs "-" dance to figure out if this value
counts as "NULL" and thus should be ignored and we should loop early.

Said dance was slightly awkward for me to code up because "-?=+" can be after :
which says "" or unset is NULL (without which only unset is NULL), but "-?="
trigger on NULL and "+" trigers on NOT null (like everything else), and of
course : can have none of the above after it and do something else entirely, so
these tests are kind of incestuous and not _repeating_ them took some staring
but I think I've got it now?

Except even then, what counts as NULL is non-obvious to me:

  $ xx() { echo "${*-abc}";}; xx one two "" four
  one two  four
  $ xx() { echo "${*:-abc}";}; xx one two "" four
  one two  four
  $ xx() { echo "${@:-abc}";}; xx one two "" four
  one two  four

I honestly thought the second two of those would stick abc between two and four
in the output, but it seems only:

  $ xx() { echo "${*-abc}";}; xx ""

  $ xx() { echo "${@:-abc}";}; xx ""
  abc
  $ xx() { echo "${*:-abc}";}; xx ""
  abc

Does. So the "is this null" test is done on the whole result, except:

  $ xx() { echo "${*/vel cro/abc}";}; xx vel cro
  vel cro

doesn't, so is the operator working on the whole output or is the operator
working on each segment individually? Different results for different operators!
Sigh. ANYWAY: pop one tangent off the stack and:

The if/else staircase figuring out which slice operator we've got doesn't
understand \ either (treats it as an invalid character of the "always error even
if variable name undefined" variety):

  $ echo ${xyz\:1:2}
  bash: ${xyz\:1:2}: bad substitution

The math parsing logic doesn't understand \ either:

  $ echo ${PATH:\1:2}
  bash: PATH: \1: syntax error: operand expected (error token is "\1")
  $ echo ${PATH:1\ :2}
  bash: PATH: 1\ : syntax error: invalid arithmetic operator (error token is "\ ")

(Even though the parsing that _got_ us here understood \ to know when the
terminating } was for line continuations...)

However, most slice payloads _do_ understand \ when they're assembling a string:

  $ ABC=abcdefg; echo ${ABC/c\de/123}
  ab123fg
  $ echo ${zyx?\}\\}
  bash: zyx: }\

So I need a "malloc a de-escaped copy of this string" function but should only
use it in specific places (which makes it more awkward because if I could just
use it on "slice" it could reliably end at "}" but ${a//} needs to know when
I've escaped the second "/" so I have to pass in the terminating character I'm
looking for and have it tell me how much of the input string it consumed... eh,
think I'd have to do that anyway?

So yes, the # operator in bash eats the whole string. It does not do the "grab
variable name and leave trailing slice operator" thing, nor does it parse the
string understanding backslash escapes. But it ONLY triggers as a prefix when it
recognizes a potential variable name after the # and otherwise falls through to
# being a normal non-prefix variable name ala ${#:1:2} except it recognizes the
"special" variable names as well, even when said special variable name is also
an operator:

  $ echo ${##}
  1
  $ echo ${#-}
  6

But it doesn't use the "stop at end of recognized potential variable name" logic
to do so, because it eats trailing slice operators when it _does_ have a
variable name and barfs on it. It has to have extra code here NOT to stop at the
end of the variable and leave the slice alone, because the common functions I've
written _would_ stop and leave the slice.

And if I _did_ use the parse variable name logic and fall through to the "leave
trailing slice operator" logic so:

  $ echo ${#/0/zero}
  zero
  $ echo ${#PATH/0/zero}
  bash: ${#PATH/0/zero}: bad substitution

worked the same way... would that break anything? It would be shorter for me
(better code re-use). But then there's the fact that the # prefix only
recognizes special variables that could also be operators when they DON'T have
anything after them:

  $ echo ${###}
  0
  $ echo ${#-\}}
  0

Even though

  $ a=b; echo ${a-} ${a#}
  b b

Which all boils down to "I need to add sooooooooo many tests to tests/sh.test"...

>> (I have recently been reminded by another shell's maintainer that I'm not smart
>> enough to have a chance of ever actually doing this, but I've never let that
>> stop me before.)
> 
> I probably know a couple of maintainers who would say that, but it's rude.

Well, they're not exactly _wrong_? I'm not the best person to do this, I'd love
for somebody ELSE to do this, but they leave the work undone and it needs doing,
so...

Someone making a heartfelt case that I have no idea what I'm doing and should
just stop has been an annual event ever since Al Viro's email telling me to quit
Linux development in 2000. (If I ever met him in person I was going to print
that out and get him to sign it, but it didn't happen.)

Every once in a while it turns into Public Drama:

  https://lwn.net/Articles/202106/

  https://lwn.net/Articles/478308/

Usually because I'm trying to convince somebody of something. (Or refuse to _be_
convinced by them.)

Sometimes the currection turns into the old "don't ask questions post errors"
route and the "no you're wrong" is immediately productive, with a result like
"Linus changes his mind about using source control":

  https://www.zdnet.com/article/row-brewing-over-linux-patches/

My Annual Ego Deflation takes all sorts of forms. I remember back when I wrote:

https://www.fool.com/archive/portfolios/rulemaker/2000/02/17/why-microsofts-stock-options-scare-me.aspx

Microsoft's investor relations department called my editor to express their
displeasure, although their actual objections were things like "don't call it
Income call it Cash Inflow" (really!) and my editor kinda freaked out at me on
the phone and then wrote an outright rebuttal to my piece the following day:

https://www.fool.com/archive/portfolios/rulemaker/2000/02/17/stock-option-rebuttal.aspx

Personally I think history was on my side in that one given Sarbanes-Oxley and
all. :)

But in that case I _did_ lose interest and wander away to do something else a
few months later. You can't just be immune to criticism and soldier on, I
honestly _am_ wrong on a fairly regular basis and need learning experiences. (Or
"this isn't what I want to do with my time" redirects, anyway.) For example I
was a huge GPL defender for years, and then it turned toxic and I had to come up
with 0BSD. (That wasn't even an "oh, I was wrong", I'd been right then stopped
_being_ right.) I keep meaning to do a proper talk on that. I put about 1/3 of
the material into my talk last year with my own history with GPL starting at
https://www.youtube.com/watch?v=MkJkyMuBm3g#t=5m and then why I moved _away_
from it and came up with a replacement starting around 25:30 .)

But as for this recent one, I already had this year's existential crisis with
https://landley.net/notes-2020.html#16-04-2020 (which was as much trumpocalypse
stress as anything), and I also had my own "your approach is wrong, you should
look at MY project" moment saying that to the crossgcc mailing list back in 2008
so I can't exactly blame other people for doing it to me. (See "learning
experiences", above.)

>>>>> Yes, variable indirection is an exception.
>>>>
>>>> One exception, yes. In bash, ${!!} doesn't indirect $!,
>>>
>>> It's true. That's never going to be useful, so bash just doesn't implement
>>> `!' as one of the special parameters for which indirection is valid. But
>>> you're right, it's inconsistent to not just accept it and expand to nothing.
>>>
>>>  ${#!} doesn't print the
>>>> length of $!,
>>>
>>> Sure it does.
>>
>>   $ echo ${#!}
>>   bash: !}: event not found
>>   $ echo "${#!}"
>>   bash: !}: event not found
> 
> Ah, history expansion. Well, turn it off, I suppose. I use scripts or `bash
> -c' to test this stuff.

  $ bash -c 'echo ${#!}'
  0
  $ bash -c 'echo ${!#}'
  bash

Cool! Thanks.

  $ bash -c 'echo ${!#x}'
  bash: ${!#x}: bad substitution
  $ bash -c 'echo ${!x#}'

... which is remove matching prefix, and it doesn't seem to mind me feeding it
an empty string.

  $ bash -c 'true& echo ${!!}'
  bash: ${!!}: bad substitution
  $ bash -c 'true& echo ${!!@}'
  bash: ${!!@}: bad substitution

You mentioned ! not recursing to accept ! above, but that kind of implies every
option is hand coded at every place it might occur? (How can generic code _not_
handle this?)

>> The context sensitive parsing doesn't do it in this case, but does
>> for ${!} and ${!@} which is why I thought it would.
> 
> Yeah, the history expansion code doesn't know very much shell syntax. It's
> part of readline. It originally didn't know any at all.

Whereas now line continuations get grouped so cursor up gives you the whole
thing. (Which is nice but requires rather a _lot_ of shell syntax knowledge to
get that right. You have to understand nesting keywords with multiple else
clauses. Took me months and a half-dozen rewrites to get that sorted.)

>>>> I _think_ if bash says "bad substitution" and mine instead Does The Thing, that
>>>> can't introduce an incompatibility in existing scripts? I think?
>>>
>>> Correct. Unless some script is, for whatever bizarre reason, counting on
>>> the error.
>>
>> Counting on the error is unlikely. Counting on a _lack_ of error for something
>> broken that never triggers: plausible.
> 
> Fragile and unreliable. I don't have a lot of sympathy.

I have to work with what's out there.

I usually wait for people to complain when it's dubious, but asking Android to
add a /bin/bash symlink pointing to toysh (leaving mksh as the default shell,
which they're not moving off of quickly) makes me err on the side of wanting
stuff to work.

That said, the surface area here is ridiculous and I'm pretty sure I'm not going
to get all of it even if I try.

>>>>> The debatable part is whether or not to short-circuit
>>>>> when the variable transformation code sees that the value it's being
>>>>> asked to transform is NULL, but that's what mksh does and one of the
>>>>> things I picked up when I experimented with the feature.
>>>>
>>>> Does that produce a different result? Seems like it's just an optimization? (Do
>>>> you have a test case demonstrating the difference?)
>>>
>>> It's whether or not to flag an unrecognized transformation operator as an
>>> error or just short-circuit before checking that because the value to be
>>> transformed is NULL.
>>
>> Yeah, I've been wondering about that, which definitely _can_ break scripts.
>>
>> But implementing it seems tricky: ${x;%} reliably errors whether or not x is
>> set, ${x~#%} never does (I can't find what ~ is supposed to do here in the man
>> page,
> 
> Of course the first one errors -- `x;' is not a valid parameter name

Neither is "x:" but it copes with that.

  $ echo ${x:2 at v}

> (and we won't even talk about the missing pattern after `%').

  $ echo ${PATH%}
  /usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games

Doesn't cause a problem? None of the "parse backslashed string" reads seem to
care about the string being zero length, modulo the "did ${#} trigger as a
prefix or not" above. Which I _think_ has to be a special case like ${!x@}. :(

> I don't think there's any argument there.

When my code recognizes a variable name it tells me where the end of it is,
meaning the caller can tell when there's anything left afterwards that would
have to be an operation to do _on_ the variable. It defers looking at said
operation until it's time to perform the operation. Certain NULL processing
early exits can short circuit the processing so it doesn't wind up trying to
modify when there's nothing to modify, so the operation never gets performed, so
it doesn't register as an error.

The bash code instead seems to be figuring out what all possible operations it
might perform are before it tries to perform them, which means it needs the
check for them to be implemented in more than one place, which means those
checks need to not just be repeated but _agree_ and stay in sync. (Unless you
factored it out into a function, but if you did why does the behavior vary in
places?)

There was this nice writeup a few years ago about "single point of truth" that
recommended trying to have just ONE place in your code that cares about each
fact, which means an entire category of basically design-level cache coherency
issues never come up. (The person who wears one watch knows what time it is. The
person who wears two is never sure.)

Unfortunately, it looks like the "agile waterfall method object oriented
antipattern" C++ buzzword brigade got ahold of it and renamed it "single source
of truth" which sounds way too much like a religion. But then those guys need
endless rules and advice because they're building on sand and their response is
to keep constructing higher floors to distance themselves from the problem. (And
like painting over dry rot, it sometimes works for a little while.)

> The second case is clearly a bug. It should produce an error.

Alas, it was not clear to me. I can't tell bug from intention without asking,
because I'm still trying to figure out "why"...

For example, I do NOT understand what is and isn't an error here:

  $ echo ${PATH:99}

  $ echo ${PATH: -60}

  $ echo ${PATH: -50}
  ocal/bin:/usr/bin:/bin:/usr/local/games:/usr/games
  $ echo ${PATH: -50:-51}
  bash: -51: substring expression < 0
  $ echo ${PATH: -50:-49}
  o
  $ echo ${PATH:0:-60}
  bash: -60: substring expression < 0
  $ echo ${PATH: -60:20}

  $ echo ${PATH: 50:20}
  /games
  $ xx(){ echo "${*:2:-2}";}; xx 1 2 3 4 5 6 7 8 9
  bash: -2: substring expression < 0
  $ xx(){ echo "${*: -2}";}; xx 1 2 3 4 5 6 7 8 9
  8 9
  $ echo "${PATH:2:-2}"
  sr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/gam

Ok, with a string, the start being off the left or right edges resolves to empty
string, but end before start is an error. Length being zero is fine. A window
including the end of the string gives the right edge of the string, a window
including the left edge of the string gives nothing?

With an array, the first argument being negative is the usual "from end", the
second argument being negative is an ERROR? (What?) And the same arguments off
left edge produces no output even when the window advances into range with stuff
in it, but arguments off right edge clip to show you what it can.

*shrug* To me, this is a pile of arbitrary behavior. What I _would_ implement is
"none of this is an error, dest<src has no matches, left and right edges clip to
show available entries the same way". That would be the simple coherent rules to
me, and would let array and string share code.

But that's not what bash is _doing_...

>> At a certain point I'm going to have to try scripts that break, and get bug
>> reports from people who expected something to work and are angry at me.
>> (Apparently it is vitally important that I care about a bash reimplementation of
>> readline, which somehow manages to be both implemented in bash and to have a
>> makefile. I've put it on the todo list.)
> 
> Why? The bash version of readline and the standalone version of readline
> are identical. Identical consisting of the same source.

Ask https://github.com/akinomyoga/ble.sh ?

>>> Sure. That's why declare reports `?' as not found. It's not a variable.
>>
>> Which is why I needed to factor out the second function, which DOES know about
>> it. (Sometimes it's a variable, sometimes it isn't...)
> 
> Well, if you want to split hairs, it's never a variable. Sometimes it's a
> parameter.

I'm pretty sure you're not calling the :1:2 part of ${x:1:2} the "slice". My
nomenclature and yours are way off from each other. But that's not one of the
things I'm trying to get right. :)

>>>>   $ echo ${*-yetthisislive}
>>>>   yetthisislive
>>>
>>> Defined by posix.
>>
>> When I get to the end of the bash man page (well, loop and do a pass with no
>> hits), I intend to do a pass over posix-2008 to see what I missed. (I did read
>> the whole thing once upon a time, it's just been a while.)
>>
>> Until then I wince at every mention of it because when the _only_ reason for
>> something is "posix"... ("Yes but why?" "Posix!")
> 
> I mean, that's not the only reason it behaves the way it does. POSIX, for
> the most part, did a decent job of codifying ksh88 and the SVR4 sh, and
> that expansion has been in there since Bourne wrote his shell.

Indeed, but that was back before Linux 0.0.1 came out in 1991. Posix did a very
good job of documenting the state of play from over 30 years ago. People who
weren't born then have kids in high school.

Don't get me wrong, computer history's a hobby of mine (ala
https://landley.net/history/mirror) and I'm _interested_, but half the reason I
want to know "how we got here" is to figure out "where should we be". For
example old "usr/bin split" rant went viral a few years back:

  http://landley.net/writing/unixpaths.html

Because when you root cause it /bin and /usr/bin being separate are a historical
accident due to hardware limitations on a specific development machine in 1973,
with retroactive justifications which were obsoleted by new developments like
"shared libraries" and "initial ramdisks" _decades_ ago.

In this instance, I'm going with a 7 year support horizon on toybox:

  https://landley.net/toybox/faq.html#support_horizon

So while my initial plan was "implement what bash 2.05b did", I'd already hit
gentoo's portage using ~= from bash 3 back in 2011:

  https://landley.net/notes-2011.html#26-12-2011

And there was a also a behavior change in how quotes were resolved, and... The
interesting question is "what scripts am I trying to get to run under the new
thing", and my old test loads were things like "gentoo's portage infrastructure"
back when I was trying to bootstrap gentoo under aboriginal linux, but distro
bootstrapping turned out to be a GIANT CAN OF WORMS:

  https://landley.net/aboriginal/about.html#hairball

Anyway, ontogeny recapitulating phylogeny still winds up with vestigial
features. Good frame of reference, but not an end result.

>> So this isn't "because posix", this one I understand the rule for, it's an
>> operator category, although the category is fuzzed a little by ${:+} which
>> shares the "maybe :" logic but triggers in the else case of that test.
> 
> Yes, `+' is the opposite of `-'; I feel like Bourne was being clever when
> he decided on that syntax.

Nice from a usage perspective, less help from an implementation perspective. :)

>> and possibly consume your way down the tree, which is why I had to bother the
>> posix guys to put the precedence BACK when they broke it in their html rendering
>> of the expr command years ago because when I sat down to try it there they'd
>> broken the spec...
> 
> That sounds ... complicated.

When I say "I break everything", yes that includes posix's html rendering.

Or if you mean the parser implementation, nah the result was pretty small. The
hard part doing it in C is having a single operator list with both the
precedence and behavior attached (without resorting to something horrible like
function pointers and 8 gazillion tiny functions).

I suspect I'll wind up with something like
https://github.com/landley/toybox/blob/master/toys/posix/find.c#L206 again,
where one if/else staircase gets called twice to serve two masters awkwardly.
That function has way way way too many if (check) tests but I need the else goto
error at the end of the staircase to tell me when it wasn't a thing.

(Sigh. I should probably have all that variable state be in a structure and have
a scratch instance of that structure I can switch to at the start where it can
always perform the action but discard the results instead of a zillion little
tests. That might simplify the code a bit...)

Anyway, right now I have a mathinate(str, len) stub function I'm calling that's
just a strtoll() wrapper, and I need to sit down and try to come up with a clean
implementation of the proper logic to Do The Thing. And then rename it to a
non-stupid name, but that's a persistent issue for me:

  https://github.com/landley/toybox/commit/df6a96d3ed0a

  https://github.com/landley/toybox/commit/371dfd41efca

(Yes, the ps display fields were stored in "struct strawberry", and the instance
the for loops iterated over was named "ever". That did not help people
understand the resulting code, so I changed it.)

> Chet

Rob