[Toybox] [PATCH] sh: pass "\" to the later app

Chet Ramey chet.ramey at case.edu
Mon Jun 19 16:32:54 PDT 2023


On 6/17/23 7:23 PM, Rob Landley wrote:
> On 6/12/23 19:40, Chet Ramey wrote:
>>> and they have a list of "special built-in utilities" that does NOT include cd
>>> (that's listed in normal utilities: how would one go about implementing that
>>> outside of the shell, do you think?)
>>
>> That's not what a special builtin means. alias, fg/bg/jobs, getopts, read,
>> and wait are all regular builtins, and they can't be implemented outside
>> the shell either.
>>
>> Special builtins are defined that way because of their effect:
>>
>> https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_14
>>
>> It's really a useless concept, by the way.
> 
> It's not that simple: kill has to be built-in or it can't interface with job
> control...

That's not what a special builtin is. `kill' is a `regular builtin' anyway.


> Wait, assignments before these magic utilities are NOT prefix assignments
> limited to the duration of the command?

How many times.

https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_14

>    $ abc=123 true
>    $ echo $abc
>    $ abc=123 :
>    $ echo $abc
>    $ abc=123 eval 'echo $abc'
>    123
>    $ echo $abc
>    $
> 
> Nope, even bash doesn't do that. 

You should have tried it in posix mode. I said it was a useless concept,
there's no way bash is going to do that in default mode.

(A prefix assignment... on continue? I can't
> even do a prefix assignment on "if", and I have _use_cases_ for that. I had that
> implemented and then backed it out again because it's an error in bash.

`if' is not a builtin.

> I
> remember I did make "continue&" work, but don't remember why...)

Why would that not work? It's just a no-op; no semantic meaning.

>> https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_02_03
> 
> And I need parsing that eats \$ and \\n but leaves \x alone, great. (Which is a
> new status flag for expand_arg(), can't be handled in a preparsing pass nor is
> NO_QUOTE gonna do it right...)

More characters than those two.

> 
> Why is only the second of these an error in bash?
> 
>    $ unset ABC; echo ${ABC::'1+2'}
>    $ ABC=abcdef; echo ${ABC::'1+2'}
>    bash: ABC: '1+2': syntax error: operand expected (error token is "'1+2'")

Because if there's nothing to operate on, bash doesn't try to process the
rest of the word expansion (and if your first command is real, echo will
output a single newline).

This is consistent with POSIX:

"If word is not needed, it shall not be expanded."

even though the substring word expansion isn't POSIX.

> I think when the EOF is quoted the HERE body has no processing, and when it's
> not quoted then $VARS \$ and \<newline> are the only special... Nope, \\ is too.

Yes, since the body is treated like it's in double quotes, and, as quoted
earlier, \ is one of the characters for which backslash retains its
behavior as a special character. The double quote is the only exception;
look at what these do:

cat <<EOF
echo "
EOF

cat <<EOF
echo \"
EOF

>    https://github.com/landley/toybox/commit/32b3587af261

Ugh.


> When you create a new local variable it does so in the most recent named
> function context (or the root context if it reaches it), skipping unnamed
> function contexts. When you resolve or modify an existing variable (or unset it,
> which creates a whiteout entry) it iterates back through all existing function
> contexts to find a matching entry (then puts one in the root context if you were
> assigning without declaring it local).
> 
> So "local blah" won't bind to an anonymous function context, and errors out if
> it reaches the root context. I _think_ it works...

OK.

>> The real question is what value LINENO should have when using -c command,
>> even though it's only defined for a script or function.
> 
> We've gone over that one before. You decided you were going to initialize it to
> 1 instead of 0,

Yes.

> still matching the behavior in my devuan install. (Still devuan bronchitis,
> haven't updated to devuan cholera yet. Um, the web page says devuan B matches
> debian "buster" and devuan C matches "bullseye", if that helps.)

Not at all. But charming version names.

> I was naieve enough to write the variable resolution logic with the design
> assumption that unbalanced quoting contexts had already been caught before the
> data was passed to us. Kinda biting me now, although I think I'm most of the way
> through it.

It was a pain to get that stuff right.

> It doesn't handle nested logical contexts, and "case" logic has unmatched
> ending parentheses that can end the $() span prematurely...)

Ha. I had ad-hoc parsing that parsed $(...) for years, and it got more and
more complex. I finally gave up on it for bash-5.2 and call the parser
recursively to find the closing right paren (and replaced hundreds of lines
of code with dozens). That's really the only way to do it correctly, but I
was stuck with some compatibility issues because of how bash had not done
it correctly in the past.

> 
>>> Mostly I'm reading the bash man page, pondering many years of
>>> writing and editing bash scripts, and doing LOTS of tests...
>>
>> And pointing out places where the man page isn't clear or doesn't describe
>> the shell's behavior, which I appreciate.
> 
> Happy to help. At the same time, trying not to spam you too badly...

It hasn't been a problem so far.

> 
>>>> The current edition is from 2018.
>>>
>>> Except they said 2008 was the last feature release and everying since is
>>> bugfix-only, and nothing is supposed to introduce, deprecate, or significantly
>>> change anything's semantics.

When, by the way?


>>> That's why it's still "Issue 7". The new stuff is
>>> all queued up for Issue 8, which has been coming soon now since the early Obama
>>> administration.
>>
>> Oh, I was there.
> 
> I was lurking on the posix list since... 2006 I think?
So you know that test now has `<' and `>' binary string operators that use
the current locale, right? That's an example of what I'm talking about.

> The project isn't dead, but those are defined as bugfix releases. Adding new
> libc functions or command line options, or "can you please put cpio and tar back
> in the command list", are out of scope for them.

So wait for issue 8, I guess? It's going to start balloting this year.


> Ken or Dennis having a reason means a
> lot to me because those guys were really smart. The Programmers Workbench guys,
> not so much. "Bill Joy decided" is a coin flip at best...

They all had different, even competing, requirements and goals. Mashey and
the PWB guys were developing for a completely different user base than the
original room 127 group, and Joy and the BSD guys had different hardware
*and* users, and then the ARPA community for 4.2 BSD.

Maybe things would be slightly different if Reiser's VM system (the one Rob
Pike raves about) had been in 32/V and then eventually made it back to
Research in time for 8th edition, but that's not the way it worked out.

> Working on it. (Well in busybox somebody else had already written an awk, I just
> sent them bug reports and made puppy eyes. This time I have to learn how to use
> "awk". And I have to write a "make". And a shell, which is in progress... :)

Seems daunting.

>> I wish you were not so reluctant. Look at how many things you've discovered
>> that I decided were bugs based on our discussions.
> 
> But I'm taking up your valuable time.

I get to make that decision, don't I? I'm not shy -- I'll tell you if you
send something dumb. Don't gatekeep yourself.

> But since you asked, today's new question I wrestled with was "what's the error
> logic for slice fields"?

Let's assume `one' is unset.

> 
>    $ echo ${one:!}
>    bash: !}: event not found

History expansion, nothing to do with the question.

>    $ echo ${one:+}

This isn't what you think it is: `:+' is a completely different word
expansion, with different behavior. Since `one' is unset, this expands to
the null string. Even if it were set, the expansion would be null since
nothing follows the `+'.

>    $ echo ${one:+:}

See above; bash doesn't do work it doesn't have to.

>    $ echo ${one:]} two
>    two

Again.

>    $ echo ${one:0/0}

And again.

>    $ echo ${PATH::1+2}
>    /ho

OK, you have a set variable, no mystery here. `offset' and `length' are
arithmetic expressions; a null arithmetic expression evaluates to 0, as
with

echo $(( ))
or
echo $(( $unsetvar ))

and described in ARITHMETIC EVALUATION. So you have three characters
starting at offset 0, the beginning of the string.

>    $ echo ${PATH::0/0}
>    bash: PATH: 0/0: division by 0 (error token is "0")

When it has to perform the arithmetic evaluation it will, and evaluation
errors get reported as expansion errors.


> It's doing math, but only _sometimes_ even reporting division by zero as an error?

See above.

>>>> Single quotes: preserved. Double quotes: removed when special. For
>>>> instance, the double quotes around a command substitution don't make the
>>>> characters in the command substitution quoted.
>>>
>>> Quotes around $() retain whitespace that would otherwise get IFS'd.
>>
>> Correct, but that's behavior that affects how the output of the command
>> substitution is treated, not how the substitution itself is parsed or
>> executed.
> 
> They're the same thing for me: my parsing produces a result.

All parsing produces a result: either a valid command tree, in whatever
data structure you want to use to represent it, or an error. But surely
you make a distinction between the $(...) expansion and what it expands to.

The question is whether $VAR is quoted in

echo "$( for f in $VAR; do echo $f; done )"

If you treat this like $( for f in "$VAR"; do echo $f; done ), you're going
to have problems.


> $ echo ${PATH//":"/xxx}
> /home/landley/binxxx/usr/local/binxxx/usr/binxxx/binxxx/usr/local/gamesxxx/usr/games
> $ echo "${PATH//':'/xxx}"
> /home/landley/binxxx/usr/local/binxxx/usr/binxxx/binxxx/usr/local/gamesxxx/usr/games
> $ echo "${PATH//"/"/xxx}"
> xxxhomexxxlandleyxxxbin:xxxusrxxxlocalxxxbin:xxxusrxxxbin:xxxbin:xxxusrxxxlocalxxxgames:xxxusrxxxgames
> 
> Quoting contexts nest...

Well, the non-POSIX expansions get to do what they want, but yes, the inner
quotes are allowed and can quote special pattern characters.


>>> (And "$@" is kind of array variable-ish already...)
>>
>> Kind of, but it's not sparse. Support for very large sparse arrays is one
>> thing that informs your implementation.
> 
> Oh goddess. (Adds note to sh.tests, which is my text file of cut and paste
> snippets to look at later. Yes, my todo lists nest.) Is sparse array a new type
> or are all arrays sparse?

All indexed arrays are sparse (the question is meaningless for associative
arrays). Indices that are set are set; indices that are not are unset.

declare -a intarray
intarray[12]=twelve

doesn't automatically set intarray[0..11] to anything.

> 
> The variable types I've currently got are:
> 
> // Assign one variable from malloced key=val string, returns var struct
> // TODO implement remaining types
> #define VAR_NOFREE    (1<<10)
> #define VAR_WHITEOUT  (1<<9)
> #define VAR_DICT      (1<<8)
> #define VAR_ARRAY     (1<<7)
> #define VAR_INT       (1<<6)
> #define VAR_TOLOWER   (1<<5)
> #define VAR_TOUPPER   (1<<4)
> #define VAR_NAMEREF   (1<<3)
> #define VAR_EXPORT    (1<<2)
> #define VAR_READONLY  (1<<1)
> #define VAR_MAGIC     (1<<0)

> WHITEOUT is when you unset a local variable so the
> enclosing scope may have an unchanged definition but variable resolution needs
> to stop there and get the ${x:=} vs ${x=} part right),

You don't need that one, really. You can use the same value and logic you
do when you have something like

declare -i foo
or
export foo

(unless you use WHITEOUT for this case as well).

`foo' exists as an unset variable, but when you assign a value to foo it
gets exported since the attribute was already there. You just have to be
really disciplined about how you treat this `exists but unset' state.


> Anyway, that leaves VAR_ARRAY, and VAR_DICT (for associative arrays). I take it
> a sparse array is NOT a dict? (Are all VAR_ARRAY sparse...?)

The implementation doesn't matter. You have indexed arrays, where the
subscript is an arithmetic expression, and associative arrays, where the
subscript is an arbitrary string. You can make them all hash tables, if
you want, or linked lists, or whatever. You can even make them C arrays,
but that will really kill your associative array lookup time.

Asking whether an associative array is sparse doesn't make much sense;
what would the definition of `sparseness' be? For indexed arrays, where the
integer subscript imposes a bounded ordering, it makes sense.

> 
> Glancing at my notes for any obvious array todo bits, it's just things like "WHY
> does unsetting elements of BASH_ALIASES not remove the corresponding alias, does
> this require two representations of the same information? 

There's no good reason, I just haven't ever made that work.


> Spite: it keeps you going.)

Misanthropy works.

> 
>>> I remember being deeply confused by ${X at Q} when I was first trying to implement
>>> it, but it seems to have switched to a much cleaner $'' syntax since?
>>
>> The @Q transformation has preferred $'...' since I introduced the
>> parameter transformations in bash-4.4. I'm not sure when you were looking
>> at it?
> 
> I stuck with the last GPLv2 release for longer than Apple did:
> 
>    https://news.ycombinator.com/item?id=18852887

But that version doesn't have parameter transformations, so that part is
moot.

>>>> They're not options, per se, according to POSIX. It handles -n as an
>>>> initial operand that results in implementation-defined behavior. The next
>>>> edition extends that treatment to -e/-E.
>>>
>>> An "initial operand", not an argument.
>>
>> That's the same thing. There are no options to POSIX echo. Everything is
>> an operand. If a command has options, POSIX specifies them as options, and
>> it doesn't do that for echo.
> 
> Hence the side-eye. In general use, echo has arguments. But posix insists it
> does not have arguments. To so save face, they've created an "argument that
> isn't an argument", and they want us to pretend that's not what they did.

Because the historical echo implementations were all incompatible -- and
worse, irreconcilable. The POSIX folks did the least worst thing. They all
exist just to make the behavior implementation-defined anyway.

> 
> "All options must come before non-option arguments" is a common use pattern,
> echo isn't special in this regard. "Unrecognized options are passed through" is
> another common pattern.

https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap12.html#tag_12_02

The latter is possible, but not encouraged.

> Heck, you want funky: "kill -stop" vs "kill -s top". It's passing through
> unrecognized arguments to a later processing pass, and retroactively declaring
> -s as unrecognized because -t isn't a thing.

Those are called out as special cases in the description of `kill' and
dependent on the system supporting the `XSI' option. I agree it's a special
case.

>>> Right. So they're going from "wrong" to "wrong" then:
>>>
>>>     $ echo -n -e 'hey\nthere'
>>>     hey
>>>     there$
>>
>> Yeah, echo is a lost cause. Too many incompatible implementations, too much
>> existing code. That's why everything non-trivial (like the above) is
>> implementation-defined. POSIX recommends that everyone use printf.
> 
>    $ printf abc\n
>    abcn$
> 
> Oh yeah, that'll happen.

What did you think would happen to the unquoted backslash?

> 
>>> Maybe posix should eventually break down and admit this is a thing? "ls . -l"
>>> has to work, but "ssh user at server -t ls -l" really really REALLY needs that
>>> second -l going to ls not ssh.
>>
>> Why do you think they don't acknowledge this today?
> 
>    https://landley.net/notes-2016.html#11-03-2016

I don't understand how the two connect? Jorg was truly abrasive, and
didn't endear himself to many people, but I don't see the connection to
argument ordering here.


> (Yes, I'm aware of recent changes. That's why I re-engaged with Posix, felt I
> owed it to them since the condition under which I said I'd come back
> unexpectedly happened. But having already written them off, my heart really
> wasn't in it. I _should_, but I'm juggling too many other balls...)
> 
>> Options only exist as
>> such if they come before the first non-option argument.
> 
>    $ cat <(echo hello) -E
>    hello$

Yeah, looks like a bug in cat to me:

$ cat <(echo hello) -E
hello
cat: -E: No such file or directory

The GNU utilities do all sorts of argument reordering, but that doesn't
mean you're going to get that in POSIX.


> 
>> Options have to
>> begin with `-'.
> 
>    tar tvzf blah.tgz
>    ps ax
>    ar t /usr/lib/libsupp.a

POSIX doesn't have `tar'.


> You can chain "ssh xargs strace nice unshare setsid timeout prlimit chroot"
> arbitrarily deep, and each command has its own arguments and then a command line
> it execs, which can itself contain arguments. That's usually WHY a command cares
> about argument position.

That's not inconsistent with the requirement that ssh options appear before
other arguments.


>> If you really want to go
>> hardcore, require that the application (user) supply a `--' before the
>> remote command and its arguments if you want to use it in this way.
> 
> But what's already there works, and has for decades.
> 
> A good standards body should document, not legislate.

Where do you think the utility syntax guidelines came from?



> And then I submitted a feature request to coreutils:
> 
>    https://lists.gnu.org/archive/html/coreutils/2022-01/msg00004.html
> 
> Which resulted in a lot of discussion, and an eventual decision to include it,
> and some patches were discussed:
> 
>    https://lists.gnu.org/archive/html/coreutils/2022-01/msg00048.html
> 
> And then when it wasn't in the next release, they said it was "still in
> development":
> 
>    https://lists.gnu.org/archive/html/coreutils/2022-04/msg00010.html
> 
> And then a year later it was still on their todo list:
> 
>    https://lists.gnu.org/archive/html/coreutils/2023-02/msg00012.html
> 
> This sort of thing consumes my "engaging with bureaucracy" meter.

You can't force volunteers to do anything. They're volunteers! It's not
bureaucracy, they just don't work for you!


> http://www.opengroup.org/testing/downloads.html says there's a no-fee license.
> Maybe closer to the 1.0 release I'll jump through the hoops to help me document
> my deviations?

Think carefully about doing that. It takes a lot of time, and I only did
the shell and builtins tests.


>>>> This is completely unspecified behavior.
>>>
>>> The standard is not complete, yes.
>>
>> A different interpretation. There's plenty of unspecified and
>> implementation-defined behavior.
> 
> Bash is an implementation, defining behavior. There may be version skew, but it
> does something specific. I just have to think of what questions to ask.

That's not the same thing. More useful for your purposes, maybe, but still
different.

> A thing I have done from time to time, but... an expensive thing, as far as
> spoons go:
> 
>    https://en.wikipedia.org/wiki/Spoon_theory

Yes, everyone has limited resources.




> Currently. Posix didn't always exist, the Linux Standard Base was making good
> progress until the Linux Foundation's accretion disk swallowed it, man7.org was
> decent until Michael Kerrisk retired and handed off to a guy who doesn't
> maintain a current web version...

If all you're interested in is Linux, then sure.


> For years the de-facto spreadsheet standard was Microsoft Excel and the word
> processing file format standard was Microsoft Word. They SUCKED, but had vastly
> dominant market share. And every weird corner cases of their behavior was part
> of that standard.
> 
> Then Star Division cloned compatible versions that could read and write those
> files in Star Office,

Yes, I used Star Office when I ran FreeBSD on my desktop for a while.


> The point is, once you have two independent implementations, the subset both
> support becomes a lot more standard-shaped. This was the IETF way back in the
> day, "rough consensus and running code". The bake-offs were to get multiple
> interoperable implementations. You NEED two, or this doesn't work. :)

Sure. But when you get beyond two, that intersecting subset becomes a lot
smaller, and the number of parties with skin in the game gets a lot larger.
That's why you have so much implementation defined behavior in the
standard. If you want to walk the road you did, and say "this
implementation is the standard one for me," then that's fine, but you're
not going to be successful getting other implementations to walk that same
road a lot of the time.



>> What are you using now?
> 
> $ bash --version
> GNU bash, version 5.0.3(1)-release (x86_64-pc-linux-gnu)

Jesus, your distro can't even be bothered to apply all the patches for a
single version?

$ ../bash-5.0-patched/bash --version
GNU bash, version 5.0.18(10)-release (x86_64-apple-darwin18.2.0)

This is what makes getting bug reports difficult.

Chet
-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
		 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU    chet at case.edu    http://tiswww.cwru.edu/~chet/



More information about the Toybox mailing list