[Toybox] bash continues to confuse me.

Sun Jul 5 15:54:24 PDT 2020

On 7/3/20 3:15 AM, Rob Landley wrote:

>> It seems like a clear rule is that a word expansion in braces beginning
>> with `#' finds the close brace and uses everything between the `#' and the
>> close brace as a parameter name, and expands to its length.
> 
> Only if you define "word expansion" as having a nonzero amount of variable name
> after the #:

For a length expression, sure, that's fine.

> 
>   $ echo ${#/0/zero}
>   zero

Yes, that's true. So you're back to finding the end of the parameter (or
the first character after the end of the parameter) and determining what
to do based on that. In this case, the parameter name is `#', the first
character after the name is `/', which happens to be an expansion operator,
the parameter name is valid, and its expansion is used by the operator.

If you want, you can scan forward until you find an operator or the
closing brace, then deal with what ends up as the parameter name at that
point. That leads to a little bit of ad-hoc code.

> 
> Which is odd because if it _does_ find one character of possible variable name
> it eats all the characters leaving no suffix:
> 
>   $ echo ${#PATH/0/zero}
>   bash: ${#PATH/0/zero}: bad substitution

Kind of. It's a shortcut. If you have a `#' followed by a character that
can start a variable name, you can immediately commit to a length
expression, since it's either a valid length expression or an error. You
just consume everything until a valid close brace and treat it as a
variable name. If it's not a valid variable name, it's an error.

> 
> But it's defining "variable name" to include the special variable names:

It's a `parameter name', which includes the special parameters.

> 
>   landley at driftwood:~/www$ echo $$
>   13781
>   $ echo ${#$}
>   5
> 
> So it has to know when it has a variable name and when it doesn't (including
> special variable names), but _not_ know when that variable name ends so it can
> process leftover chars ala ${#@@Q} thus quoting '0' because #@ undergoes @Q.

You consume the parameter name, since you know what a valid parameter name
looks like, and determine what to do based on the following character.
You're right that you treat `#' as a parameter name if there's nothing
following it before you see the close brace or an operator. But if you see
a variable name (as above) or a special parameter that you can take the
length of, you treat it as a length expression. That's probably less
regular than you'd like.

Bash complicates things because it allows you to take the length of certain
special parameters that are the same as some of the operators (e.g., ${#-}
is defined and means the length of $-). That requires one more character of
lookahead.

> Backslash is funky:
> 
>   $ echo ${#PATH\}
>   >
> 
> The line continuation parsing logic knows about \ within ${} but only SOME of
> the ${} contents parsing logic (contextually) does.

The backslash means that you don't have the closing brace.

"Any '}' escaped by a <backslash> or within a quoted string, and characters
in embedded arithmetic expansions, command substitutions, and variable
expansions, shall not be examined in determining the matching '}'."

But the backslash is not discarded.

> My current understanding is that the ${XXX} plumbing parses the XXX in a
> specific sequence: tt's basically breaking the line down into what I've been
> calling "prefix", "name", and "slice" sections.

Kind of, I guess, if you're saying that `slice' means "the operator and
the characters following it."

> 
> 1) Grab prefix characters (# or !), if a prefix is found parse the string with
> custom code for the prefix type to resolve a value. None of the prefix parsing
> logic understands \escapes, but it _can_ leave a slice section to handle later,
> and that later slice logic handling that section may.

There's not a lot of `custom code' for this case, but the expansion code
does know what it allows to follow the `#' or, to a lesser extent, the `!',
and treats the rest of the contents of the braces accordingly.

I don't know what you mean by `resolve a value': do you mean finding the
parameter name that this prefix will operate on?

> 
> 2) Else grab variable name (including special names ala $$ and $*) and resolve
> it, record where variable name ends and what's left over is the slice operator.
> This is an "else" because if a prefix was found it skips this part. The variable
> name cannot have an \escape in it because the \ terminates the variable name
> processing:
> 
>   $ echo ${\!}
>   bash: ${\!}: bad substitution
>   $ echo ${\a}
>   bash: ${\a}: bad substitution

No. The backslashes are not discarded here. The parameter name is exactly
`\!' or `\a'. Since neither of those is valid, the substitution is invalid.

> Both 1) and 2) above resolve what the found to a value, which can be either a
> string or an array of strings. 

Not really, since the expansion of the parameter is deferred until we find
out what the expansion is going to be.

But if you mean the parameter is expanded so the expansion code can figure
out what to do with it based on the operator, then that's true.

> It then falls through to the $IFS handling logic on the newly resolved data, and
> if there's a leftover "slice" that transformation is done to the resolved data
> before the $IFS stuff happens.

Is this a convoluted way of saying that word splitting happens on the
results of the expansion?

> If the slice operator isn't NULL or "}" then there's an if/else staircase of
> operator types, starting with the ":-" vs "-" dance to figure out if this value
> counts as "NULL" and thus should be ignored and we should loop early.

You just have to note whether or not it's null, since the different
operators treat null values differently, then go through the operator
cases.

> 
> Said dance was slightly awkward for me to code up because "-?=+" can be after :
> which says "" or unset is NULL (without which only unset is NULL), but "-?="
> trigger on NULL and "+" trigers on NOT null (like everything else), and of
> course : can have none of the above after it and do something else entirely, so
> these tests are kind of incestuous and not _repeating_ them took some staring
> but I think I've got it now?

The part you have to watch out for, as you discovered, is that `:' followed
by anything other than the POSIX standard operators ("-?=+") is an operator
on its own.

> 
> Except even then, what counts as NULL is non-obvious to me:
> 
>   $ xx() { echo "${*-abc}";}; xx one two "" four
>   one two  four
>   $ xx() { echo "${*:-abc}";}; xx one two "" four
>   one two  four
>   $ xx() { echo "${@:-abc}";}; xx one two "" four
>   one two  four
> 
> I honestly thought the second two of those would stick abc between two and four
> in the output,

The POSIX operators don't treat the traditional Bourne shell operators
as operating on the positional parameters separately, since the Bourne
shell did not.

 but it seems only:
> 
>   $ xx() { echo "${*-abc}";}; xx ""
> 
>   $ xx() { echo "${@:-abc}";}; xx ""
>   abc
>   $ xx() { echo "${*:-abc}";}; xx ""
>   abc
> 
> Does. So the "is this null" test is done on the whole result, except:
> 
>   $ xx() { echo "${*/vel cro/abc}";}; xx vel cro
>   vel cro
> 
> doesn't, so is the operator working on the whole output or is the operator
> working on each segment individually? Different results for different operators!

Yes. The POSIX operators behave like POSIX specifies, and the ones bash or
ksh or mksh invented behave a little bit more sanely. In fact, POSIX leaves
the behavior of any other expansion except "-?=+" unspecified if the
parameter is `*' or `@', and its rules for expanding the `*' and `@' are
kind of magic.

> The if/else staircase figuring out which slice operator we've got doesn't
> understand \ either (treats it as an invalid character of the "always error even
> if variable name undefined" variety):
> 
>   $ echo ${xyz\:1:2}
>   bash: ${xyz\:1:2}: bad substitution

What this means is that since backslashes can quote the operator and the
close brace, the parameter name is `xyz\:1', which is invalid.

> 
> The math parsing logic doesn't understand \ either:
> 
>   $ echo ${PATH:\1:2}
>   bash: PATH: \1: syntax error: operand expected (error token is "\1")
>   $ echo ${PATH:1\ :2}
>   bash: PATH: 1\ : syntax error: invalid arithmetic operator (error token is "\ ")

Correct. It's not a valid mathematical character, and you haven't performed
any kind of quote removal when you evaluate it.

> 
> (Even though the parsing that _got_ us here understood \ to know when the
> terminating } was for line continuations...)

How is that relevant? The close brace wasn't quoted. Remember the
backslashes aren't removed until quote removal, and there are specific
rules for determining the close brace. A `\1' doesn't fall into any of
them.

> 
> However, most slice payloads _do_ understand \ when they're assembling a string:
> 
>   $ ABC=abcdefg; echo ${ABC/c\de/123}
>   ab123fg

The stuff between the slashes is a pattern, and backslashes have meaning in
shell patterns.

>   $ echo ${zyx?\}\\}
>   bash: zyx: }\

The result of the expansion is `\}\\'. That gets expanded to `}\' due to
quote removal. That's not technically POSIX-conformat, but everyone does
it that way.

> So yes, the # operator in bash eats the whole string. It does not do the "grab
> variable name and leave trailing slice operator" thing, nor does it parse the
> string understanding backslash escapes. But it ONLY triggers as a prefix when it
> recognizes a potential variable name after the # and otherwise falls through to
> # being a normal non-prefix variable name ala ${#:1:2} except it recognizes the
> "special" variable names as well, even when said special variable name is also
> an operator:
> 
>   $ echo ${##}
>   1
>   $ echo ${#-}
>   6

Yes. The last is somewhat ad-hoc, but required.

> 
> 
> But it doesn't use the "stop at end of recognized potential variable name" logic
> to do so, because it eats trailing slice operators when it _does_ have a
> variable name and barfs on it. It has to have extra code here NOT to stop at the
> end of the variable and leave the slice alone, because the common functions I've
> written _would_ stop and leave the slice.

Yes, because length expressions cannot be part of any other expansion. They
are an expansion unto themselves, not a `parameter' in ${parameterOPword}.

> 
> And if I _did_ use the parse variable name logic and fall through to the "leave
> trailing slice operator" logic so:
> 
>   $ echo ${#/0/zero}
>   zero
>   $ echo ${#PATH/0/zero}
>   bash: ${#PATH/0/zero}: bad substitution
> 
> worked the same way... would that break anything?

Yes, that would break things. If the `#' appears in a context where it can
be a parameter name, you have to treat it as a parameter name.

 It would be shorter for me
> (better code re-use). But then there's the fact that the # prefix only
> recognizes special variables that could also be operators when they DON'T have
> anything after them:
> 
>   $ echo ${###}
>   0

Right. That's a parameter name of `#' followed by the `#' pattern removal
operator, with `#' as the pattern.

>   $ echo ${#-\}}
>   0

A parameter name of `#' followed by a valid operator and a word.

> 
> Even though
> 
>   $ a=b; echo ${a-} ${a#}
>   b b

Still valid operators.

> Well, they're not exactly _wrong_? I'm not the best person to do this, I'd love
> for somebody ELSE to do this, but they leave the work undone and it needs doing,
> so...

And that is what makes you the right person to do what you're doing.

>   $ bash -c 'echo ${!#x}'
>   bash: ${!#x}: bad substitution
>   $ bash -c 'echo ${!x#}'

Bash allows you to indirectly reference $# (when not in POSIX mode), and
it's a handy way to get the last positional parameter. The trailing `x'
makes it an invalid substitution.

> 
> ... which is remove matching prefix, and it doesn't seem to mind me feeding it
> an empty string.
> 
>   $ bash -c 'true& echo ${!!}'
>   bash: ${!!}: bad substitution
>   $ bash -c 'true& echo ${!!@}'
>   bash: ${!!@}: bad substitution

But it does not allow you to indirectly reference $!. That's not going to
be useful.

> 
> You mentioned ! not recursing to accept ! above, but that kind of implies every
> option is hand coded at every place it might occur? (How can generic code _not_
> handle this?)

Bash knows which special parameters it allows to be indirectly referenced.
Anything else doesn't get treated as an indirect expansion, which can
either result in an operator or an error.

> 
>>> The context sensitive parsing doesn't do it in this case, but does
>>> for ${!} and ${!@} which is why I thought it would.
>>
>> Yeah, the history expansion code doesn't know very much shell syntax. It's
>> part of readline. It originally didn't know any at all.
> 
> Whereas now line continuations get grouped so cursor up gives you the whole
> thing. (Which is nice but requires rather a _lot_ of shell syntax knowledge to
> get that right. You have to understand nesting keywords with multiple else
> clauses. Took me months and a half-dozen rewrites to get that sorted.)

That's not part of readline. That's part of the shell's plumbing that
decides how to handle adding lines to the history. The history library
(and readline, since history is built as part of readline) provide
add_history() to add lines to the history but don't insist you use it
directly. You can use a mixture of add_history and replace_history_entry to
get commands -- which are necessary application-specific -- added to the
history list. A python interpreter using readline could do that a different
way using the same general mechanism to get python commands as single
history entries.

> 
>>>>> I _think_ if bash says "bad substitution" and mine instead Does The Thing, that
>>>>> can't introduce an incompatibility in existing scripts? I think?
>>>>
>>>> Correct. Unless some script is, for whatever bizarre reason, counting on
>>>> the error.
>>>
>>> Counting on the error is unlikely. Counting on a _lack_ of error for something
>>> broken that never triggers: plausible.
>>
>> Fragile and unreliable. I don't have a lot of sympathy.
> 
> I have to work with what's out there.

Counting on whether or not a broken construct triggers an error? Potential
disaster. I have no sympathy if it breaks. I could, at any moment, come to
my senses and decide to add an error message for invalid syntax.

> 
> I usually wait for people to complain when it's dubious, but asking Android to
> add a /bin/bash symlink pointing to toysh (leaving mksh as the default shell,
> which they're not moving off of quickly) makes me err on the side of wanting
> stuff to work.

Report them as bugs. It's not unheard of.

>> Of course the first one errors -- `x;' is not a valid parameter name
> 
> Neither is "x:" but it copes with that.
> 
>   $ echo ${x:2 at v}

That is a valid offset:length expansion, and you get an arithmetic
expression error. Needless to say, `;' isn't a valid operator, either.

> 
>> (and we won't even talk about the missing pattern after `%').
> 
>   $ echo ${PATH%}
>   /usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games

A null pattern is valid. For instance, it's fine to replace a match with
nothing in ${param/pat/rep}.

> The bash code instead seems to be figuring out what all possible operations it
> might perform are before it tries to perform them, which means it needs the
> check for them to be implemented in more than one place, which means those
> checks need to not just be repeated but _agree_ and stay in sync. (Unless you
> factored it out into a function, but if you did why does the behavior vary in
> places?)

What the bash code does, at least in this case, is determine the parameter,
figure out what it's doing with it, consume the object of that operator,
and process the parameter accordingly.

I won't argue that there's code duplication. That's just the consequence
of 30+ years of development.

>> The second case is clearly a bug. It should produce an error.
> 
> Alas, it was not clear to me. I can't tell bug from intention without asking,
> because I'm still trying to figure out "why"...

I updated this answer. It's an undocumented case-modification operator.

> 
> For example, I do NOT understand what is and isn't an error here:
> 
>   $ echo ${PATH:99}

Your path must be less than 99 characters long. It's hard to tell what's
going on without $PATH. Keep in mind that an OFFSET < 0 takes a substring
N characters from the end of the string, and a LENGTH < 0 takes that many
characters from the end of the string.

>   $ xx(){ echo "${*:2:-2}";}; xx 1 2 3 4 5 6 7 8 9
>   bash: -2: substring expression < 0

Bash doesn't allow negative lengths to count back from the end of the
positional parameters. (Technically, this isn't defined for `*', but bash
treats it the same as `@'.)

>   $ xx(){ echo "${*: -2}";}; xx 1 2 3 4 5 6 7 8 9
>   8 9

But negative offsets work.

>   $ echo "${PATH:2:-2}"
>   sr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/gam
> 
> 
> Ok, with a string, the start being off the left or right edges resolves to empty
> string, but end before start is an error. Length being zero is fine. A window
> including the end of the string gives the right edge of the string, a window
> including the left edge of the string gives nothing?

> 
> With an array, the first argument being negative is the usual "from end", the
> second argument being negative is an ERROR? (What?) And the same arguments off
> left edge produces no output even when the window advances into range with stuff
> in it, but arguments off right edge clip to show you what it can.

Yes, negative lengths don't work. And no, offsets that take you back before
the start of the array aren't valid.

>>> At a certain point I'm going to have to try scripts that break, and get bug
>>> reports from people who expected something to work and are angry at me.
>>> (Apparently it is vitally important that I care about a bash reimplementation of
>>> readline, which somehow manages to be both implemented in bash and to have a
>>> makefile. I've put it on the todo list.)
>>
>> Why? The bash version of readline and the standalone version of readline
>> are identical. Identical consisting of the same source.
> 
> Ask https://github.com/akinomyoga/ble.sh ?

Yeah, no.

> 
>>>> Sure. That's why declare reports `?' as not found. It's not a variable.
>>>
>>> Which is why I needed to factor out the second function, which DOES know about
>>> it. (Sometimes it's a variable, sometimes it isn't...)
>>
>> Well, if you want to split hairs, it's never a variable. Sometimes it's a
>> parameter.
> 
> I'm pretty sure you're not calling the :1:2 part of ${x:1:2} the "slice". My
> nomenclature and yours are way off from each other. But that's not one of the
> things I'm trying to get right. :)

Whatever model works for you. It just has to be comprehensive enough to
capture all of the bash behavior.

> 
>>>>>   $ echo ${*-yetthisislive}
>>>>>   yetthisislive
>>>>
>>>> Defined by posix.
>>>
>>> When I get to the end of the bash man page (well, loop and do a pass with no
>>> hits), I intend to do a pass over posix-2008 to see what I missed. (I did read
>>> the whole thing once upon a time, it's just been a while.)

Don't bother, unless you mean the 2018 version of issue 7.

> And there was a also a behavior change in how quotes were resolved, and... 

Keep in mind that bash has had bugs, too, and those bugs include "doing
things differently from how other shells have done them and how POSIX
specifies them to be done." Sometimes you have to break backwards
compatibility.

Chet

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
		 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU    chet at case.edu    http://tiswww.cwru.edu/~chet/