[Toybox] [PATCH] sh: pass "\" to the later app

Chet Ramey chet.ramey at case.edu
Thu Jul 6 18:09:02 PDT 2023


On 7/5/23 3:29 AM, Rob Landley wrote:


>>>> It's really a useless concept, by the way.
>>>
>>> It's not that simple: kill has to be built-in or it can't interface with job
>>> control...
>>
>> That's not what a special builtin is. `kill' is a `regular builtin' anyway.
> 
> I started down the "rereading that mess" path and it's turning into "reading all
> the posix shell stuff" which is not getting bugs fixed. And once again, this is
> a BAD STANDARD. Or at least badly organized. There's three groups here:

OK. This is a decision that was made, what, 45 years ago? These are the
Bourne shell special builtins -- at least as of SVR4. Korn added a couple,
but since the Bourne shell didn't have them, they were not added to the
list.

Special builtins will exit a non-interactive shell on an error, assignments
preceding them persist, and they're found before shell functions in the
command search order. That's pretty much it. It's not that the builtins
have to be implemented interally, but that these have other properties.

They're a POSIX concept, so bash conforms when in posix mode. In default
mode, every builtin is treated the same.

> 1) flow control commands: break, continue, dot, eval, exec, exit, trap, return.
> 
> 2) variable manipulation commands: export, readonly, set, shift, unset.
> 
> 3) random crap: colon, times.
> 
> Why group 1 doesn't include "wait" I dunno.

It's not a Bourne shell special builtin: errors in it don't exit the shell.


  Why group 2 has set but not read or
> alias/unalias in it I couldn't tell you, 

read isn't a Bourne shell special builtin; errors in it don't exit the
shell. The SVR4 shell doesn't have aliases (and aliases were originally
optional in POSIX, part of the UPE).

and for that matter cd is defined to
> set $PWD.

cd is a weird one. The v7 Bourne shell exited the shell if the directory
argument didn't exist, and that didn't change until SVR4.2, but POSIX
declined to make it a special builtin.

  Distinguishing : from true seems deeply silly

true wasn't a special builtin in the Bourne shell.

(especially when [ and
> test aren't)

Not part of the Bourne shell, only came in in System III, never a special
builtin.

> and "times" is job control 

It's not. It's a straightforward interface to the `times' library function
(originally system call in 7th edition).

(it's smells like a jobs flag, but
> they're not including bg/fg here either which are basically flow control group 1
> above).

Job control wasn't included until the SVR4.2 sh, and it was optional in
POSIX for a long time.

> 
> And having "command" _not_ be special is just silly:
> 
>    $ command() { echo hello; }
>    $ command ls -l
>    hello

It really can't be; one of the uses for command is to suppress the effects
of special builtins, so they won't exit the shell on error.

> 
> There's only a few more commands like hash that CAN'T be implemented as child
> processes, but they don't bother to distinguish them. 

It's not the difference between special builtins and external commands,
it's the difference between regular builtins and special builtins.

> I know there's the "this
> may syntax error and exit the shell" distinction but don't ask me how set or
> true are supposed to do that. 

set exits the shell on an invalid option and was special in the Bourne
shell; true isn't a special builtin.

(I _think_ they added set here because set -u can
> cause a shell error later? Maybe? But then why unset? 

Well, unset didn't exist in the 7th edition shell, but it's special
in the SVR4 shell. It can only fail if asked to unset a readonly
variable or one of the shell's non-unsettable variables. It takes no
options, does no argument checking for invalid identifiers, and unsetting
a variable that's not set isn't an error, but when asked to unset a
variable the shell says you can't, the shell exits.

I think POSIX made unset a special builtin because the SVR4 sh did and
so it would be found in the command search before a function. That gets
important when you're trying to write a secure script, especially when you
can inherit functions from the environment (bash) or run a startup file for
non-interactive shells.

It doesn't seem to affect
> flow control:
> 
>    $ readonly potato=x; for i in one two three; do echo $i; unset potato; done
>    one
>    bash: unset: potato: cannot unset: readonly variable
>    two
>    bash: unset: potato: cannot unset: readonly variable
>    three
>    bash: unset: potato: cannot unset: readonly variable

If you were in posix mode it would exit the shell.

> I guess it's just the sh -c 'a=b set d=e; echo $a' nonsense which only dash
> seems to bother with, which is a good reason _not_ to do it if you ask me...

Everyone does it. Bash does it in posix mode.

> 
> In general, And this whole "can exit on error thing" doesn't seem hugely honored
> even when posix says (implies) you can:
> 
>    $ declare -i potato=1/0
>    bash: declare: 1/0: division by 0 (error token is "0")
>    $ declare -i potato
>    $ set potato=1/0
>    $ echo $potato
>

I guess I don't understand these examples. declare isn't a special builtin,
and there's nothing wrong with setting $1 == "potato=1/0" even though set
is a special builtin.

>    $
>    $ (set -x; echo hello ) 2>/dev/full
>    hello

echo isn't a special builtin, either. But what if it were? It would exit
the subshell either way.

> Oh, by the way, I remember setting LINENO read only made the shell quite chatty,
> but when I tested it just now it was ignored instead?
> 
>    $ readonly LINENO
>    $ echo $LINENO
>    2
>    $ echo $LINENO
>    3
>    $ declare -p LINENO
>    declare -ir LINENO="4"
>    $ echo $LINENO

It would error if you actually tried to assign it a value, and, in posix
mode, exit the shell.

>>> I
>>> remember I did make "continue&" work, but don't remember why...)
>>
>> Why would that not work? It's just a no-op; no semantic meaning.
> 
> Not _quite_ a NOP:

I mean, it creates a child process which immediately exits, but it has
no effect on the shell other than to note that we created an asynchronous
child process (which sets $!) that exited. It certainly doesn't affect
flow control.

> 
>    $ for i in one two three; do echo a=$i; continue& b=$i; done
>    a=one
>    [1] 30698
>    a=two
>    [2] 30699
>    a=three
>    [3] 30700
>    [1]   Done                    continue
>    [2]-  Done                    continue
> 
> Notice the child processes and lack of b= lines.

Why would you expect a b= line? Even if the `continue&' were not there,
the `;' after the first echo command makes the b= line a separate simple
command. Who's going to echo `b=$i' and why would they? Maybe if you had
an `echo' in there instead.

> No, if you want a NOP, put a flow control statement in a pipe:
> 
>    $ for i in one two three; do echo a=$i; continue | echo b=$i; echo c=$i; done

These are not equivalent commands.


> Backslash in double quote context leaves most characters alone but eats \ $ and
> newline, and unquoted HERE documents are in double quote context.

Yes, but:

> 
>    $ cat<<EOF
>    > \a \b \c \$ \\ \d \
>    > EOF
>    > EOF
>    \a \b \c $ \ \d EOF
> 
> As far as I can tell, it's NOT more than \$ \\ and \<newline> that get special
> treatment in this context? 

Plus double quote (in double quotes, but not here-documents) and
backquote.

> 
>    $ cat<<EOF
>    > <(echo hello)
>    > EOF
>    <(echo hello)
>    $ cat<<EOF
>    > <(echo $(echo potato))
>    > EOF
>    <(echo potato)
> 
> Yup, just the $ ones and those three \ ones?

Backslash escapes five characters in double quotes, four in here-documents
with unquoted delimiters.


> And it's the short-circuit logic again:
> 
>    $ echo $((1?2:(1/0)))
>    2
>    $ echo $((1&&(1/0)))
>    bash: 1&&(1/0): division by 0 (error token is "0)")
>    $ echo $((1||(1/0)))
>    1

That's not the same thing; arithmetic expression evaluation follows the
C rules for suppressing evaluation.

> 
> I hadn't put an "echo" in there, but I'd noticed that \" is already not removed
> in HERE context. I'd _forgotten_ that it is in "abc" context.

Right.

> I have a vague todo item for that, but the problem is my data structures don't
> recurse like that so I don't have a good place to stick the parsed blockstack
> for $() and <() and so on, but it just seems wasteful to re-parse it multiple
> times and discard it?

It kind of is, but you need to keep the text around for something like

cat <<$(a)
x
$(a)

which POSIX says has to work. ash-derived shells like dash fall over dead
when presented with that, but it's rare enough that I guess it's a win.
Even bash isn't perfect about reconstituting the text.


> Yeah yeah, premature optimization. I'm fiddling with this stuff a bit anyway for
> function definitions, but when you define a function inside a function my code
> has a pass that goes back and chops the inner functions out and puts them in a
> reference counted list and replaces them with a reference:
> 
>    $ x() { y() { echo why; }; echo ecks; unset -f x; y; }; x; y; x
>    ecks
>    why
>    why
>    bash: x: command not found
> 
> I don't THINK I can do a local function, it's a global function namespace, they
> outlive the text block that defined them, and you can still be running a
> function that is no longer defined, so... reference counting. :P

Reference counting is ok. Bash just copies the parsed function body (x in
this case) and executes that, then frees it. That way you can let the
function get unset and not worry about it.

> But still, the pipeline list itself isn't what's nesting there. I think. And
> given that arguments can be "abc$(potato)xyz" with the nested thingy in the
> middle of arbitrary other nonsense, deferring dealing with that until variable
> resolution time and then just feeding the string between the two () to
> do_source() made sense at the time...

You have to parse it to find the end of the command substitution, bottom
line. You can't get it right otherwise.


>>>>>> The current edition is from 2018.
>>>>>
>>>>> Except they said 2008 was the last feature release and everying since is
>>>>> bugfix-only, and nothing is supposed to introduce, deprecate, or significantly
>>>>> change anything's semantics.
>>
>> When, by the way?
> 
> When did they say this? Sometime after the 2013 update went up, before the 2018
> update went up. It was on the mailing list, but...

I don't remember seeing that is all.


>>> The project isn't dead, but those are defined as bugfix releases. Adding new
>>> libc functions or command line options, or "can you please put cpio and tar back
>>> in the command list", are out of scope for them.

cpio and tar were two of those incompatible-never-the-twain-shall-meet
things, so we have pax (and peace too, I guess).


> It was nice when posix noticed that glibc'd had dprintf() for years, it was nice
> when they noticed linux had openat() and friends, but it was never a leading
> indicator. 

They don't go out and look for this stuff. Someone has to write a proposal
in the proper format and shepherd it through. Look at how long it took
for $'string'.

> When they removed "tar" and "cpio", Linux didn't. (Initramfs is cpio.
> RPM is cpio.) Nobody installs "pax" by default.

$ type -a pax
pax is /bin/pax

If you want to pass a certification test, you do.

> Document, not legislate...

Except back in 1990 where the tar folks and the cpio folks both politely
told each other to pound sand, and that they'd never approve the rival
format and utility, and POSIX had to do something.

> 
>>> Ken or Dennis having a reason means a
>>> lot to me because those guys were really smart. The Programmers Workbench guys,
>>> not so much. "Bill Joy decided" is a coin flip at best...
>>
>> They all had different, even competing, requirements and goals. Mashey and
>> the PWB guys were developing for a completely different user base than the
>> original room 127 group, and Joy and the BSD guys had different hardware
>> *and* users, and then the ARPA community for 4.2 BSD.
>>
>> Maybe things would be slightly different if Reiser's VM system (the one Rob
>> Pike raves about) had been in 32/V and then eventually made it back to
>> Research in time for 8th edition, but that's not the way it worked out.
> 
> The Apple vs Franklin decision extended copyright to cover binaries in 1983,
> clearing the way for AT&T to try to commercialize the hell out of System III/V

I think the 1982 decision that allowed at&t to get into the computer and
software business after giving up its telephony monopoly had more to do
with it, but that certainly helped at&t.

After that, at&t and its "consider it standard!" campaign eventually did
the job.

> But I still think the main stake to the heart was the Bell Labs guys getting put
> back in their bottle by AT&T legal, meaning nobody ever saw the Labs' Unix
> Release 8-10, or got to look at Plan 9 before Y2K.

They weren't really interested in writing software for commercial use, and
at&t was very interested in commercializing Unix.

> The result of $(blah) and $BLAH are handled the same there? Quotes _inside_ the
> subshell are in their own context.

Yes, that's the point I was trying to make.

> Hmmm... Smells a bit like indexed arrays are just associative arrays with an
> integer key type, but I guess common usage leans towards a span-based
> representation?

It depends on whether or not you want to support very large arrays. The
bash implementation has no trouble with

a=( [0x1000000]=$'\371\200\200\200\200' [0x1000001]=$'\371\200\200\200\201' 
[0x1000002]=$'\371\200\200\200\202' [0x1000003]=$'\371\200\200\200\203' 
[0x1000004]=$'\371\200\200\200\204' )

Which will eat huge amounts of memory if you use a C-type array. Bash uses
a doubly-linked list with some saved indices to make sequential access
very fast.

>> You just have to be
>> really disciplined about how you treat this `exists but unset' state.
> 
>    $ export WALRUS=42; x() { local WALRUS=potato; unset WALRUS; WALRUS=abc;
>    > echo $WALRUS; env | grep WALRUS;}; x
>    abc
>    WALRUS=42
> 
> Ok, now I'm even more confused. It's exporting inaccessable values? (I know that
> you can export a local, which goes away when the function returns...)

Creating a local variable, which does not inherit the attributes from any
global variable, does not cause the environment to be recreated.

>>> Anyway, that leaves VAR_ARRAY, and VAR_DICT (for associative arrays). I take it
>>> a sparse array is NOT a dict? (Are all VAR_ARRAY sparse...?)
>>
>> The implementation doesn't matter. You have indexed arrays, where the
>> subscript is an arithmetic expression, and associative arrays, where the
>> subscript is an arbitrary string. You can make them all hash tables, if
>> you want, or linked lists, or whatever. You can even make them C arrays,
>> but that will really kill your associative array lookup time.
> 
> Eh, sorted with binary search, but that has its own costs...

Resorting the array (or rebalancing a tree, or whatever) every time you add
a value? That's more work than is worth it.

> Again, sounds like an indexed array is just an associative array with an integer
> lookup key...

Sure, if you want to look at it that way.

> 
>>> Glancing at my notes for any obvious array todo bits, it's just things like "WHY
>>> does unsetting elements of BASH_ALIASES not remove the corresponding alias, does
>>> this require two representations of the same information?
>>
>> There's no good reason, I just haven't ever made that work.

There's no unset hook for dynamic variables.

>>>>> An "initial operand", not an argument.
>>>>
>>>> That's the same thing. There are no options to POSIX echo. Everything is
>>>> an operand. If a command has options, POSIX specifies them as options, and
>>>> it doesn't do that for echo.
>>>
>>> Hence the side-eye. In general use, echo has arguments. But posix insists it
>>> does not have arguments. 

I was never sure what this is supposed to mean. What POSIX calls operands
are arguments, are they not?

>> What did you think would happen to the unquoted backslash?
> 
> I meant asking newbies to learn to use printf from the command line before echo
> means they have to quote the argument and add \n on the end as part of "simple"
> usage, which seems a fairly heavy lift.

The sole advantage echo has for a newbie is that it adds the newline.

>>>>> Maybe posix should eventually break down and admit this is a thing? "ls . -l"
>>>>> has to work,

Why does `ls . -l' have to work?

ls . -l
ls: -l: No such file or directory
.:
[directory contents]

If the Linux folks want to reorder arguments so that things that look like
options come first, then they can do it as an extension.


> You asked why do I think posix doesn't acknowledge $THING today. My experience
> with raising issues where posix and common usage seemed to have significant
> daylight between them involved abrasive gatekeeping, resulting in me wandering
> away again and leaving the historical memorial to HP-UX and A/UX and so on to
> its own devices.
> 
> It's possible my experience was unusual?

Not necessarily; Jorg treated a lot of people that way. But the mistake is
treating him as a representative of anything but himself or a member of the
working group.

>>> (Yes, I'm aware of recent changes. That's why I re-engaged with Posix, felt I
>>> owed it to them since the condition under which I said I'd come back
>>> unexpectedly happened. But having already written them off, my heart really
>>> wasn't in it. I _should_, but I'm juggling too many other balls...)
>>>
>>>> Options only exist as
>>>> such if they come before the first non-option argument.
>>>
>>>     $ cat <(echo hello) -E
>>>     hello$
>>
>> Yeah, looks like a bug in cat to me:
>>
>> $ cat <(echo hello) -E
>> hello
>> cat: -E: No such file or directory
>>
>> The GNU utilities do all sorts of argument reordering, but that doesn't
>> mean you're going to get that in POSIX.
> 
> See "daylight between what posix says and what reality's been doing for
> decades", above.

POSIX isn't a "let's rubberstamp what Linux is doing despite what other
implementations do" kind of group.

> When I see reality does not match posix, I do not automatically conclude that
> reality is wrong.

Your day-to-day computing reality, sure. My day-to-day computing
environment is different, for example, and in this case, it seems to
match POSIX.

> 
>>>> Options have to
>>>> begin with `-'.
>>>
>>>     tar tvzf blah.tgz
>>>     ps ax
>>>     ar t /usr/lib/libsupp.a

Yep, not posix.

>> That's not inconsistent with the requirement that ssh options appear before
>> other arguments.
> 
> My point was those are basically the only cases where that requirement exists.
> The rest of them can "rm dir -r" and what posix says about it doesn't matter.

Sure, on Linux.

> (And yes I have a TODO item to have wildcards expand to "./-file" as necessary...)

Contortions like that are why argument reordering is a bad idea.


> There are instances where they've been good, yes. Removing tar was "legislate,
> not document" and they explicitly refused to acknowledge that it was a mistake
> over a decade later.

Refer to my previous comment about pounding sand. The standard would not
have been approved in 1992 with tar and cpio. There were a lot more
companies with a stake in it back then.

> 
> The FSF required signed paper copyright assignments to be filed with the boston
> office for decades.

I know.

> The "cathedral" in "The Cathedral And the Bazaar" was the
> GNU project, as mentioned in the paper's abstract on the 1998 Usenix website
> https://www.usenix.org/conference/1998-usenix-annual-technical-conference/software-development-models-cathedral-and-bazaar

Kind of. It was mentioned, and used as an example, but Kirk giving the talk
with esr kind of biased the Cathedral model towards BSD.

> It's kinda bureaucracy-ish.

As the stakes rise, and the scope grows, processes grow to meet them. The
culture changes.


> I have a whole bunch of blue sky todo items, but my _focus_ is getting A)
> Android self-hosting,

Yeah, there's a ways to go.

https://lists.gnu.org/archive/html/help-bash/2023-06/msg00117.html

They mess up the simple stuff.

> And that was still better than the horrors of gentoo! (I met with Daniel Robbins
> in person a couple times and we tried to get stuff to work, but that was after
> he left gentoo and started funtoo).

Don't start with me about gentoo.

> Eventually the Alpine Linux guys came along and built a distro around the work
> I'd done (after I'd already left it behind, but hey).

Isn't that the default Linux image for Docker?

> Plus make and bash, which can't be external gpl packages _and_ ship in the
> android base image.

Thorsten would be happy for android to keep using mksh, I'm sure.

>>>> What are you using now?
>>>
>>> $ bash --version
>>> GNU bash, version 5.0.3(1)-release (x86_64-pc-linux-gnu)
>>
>> Jesus, your distro can't even be bothered to apply all the patches for a
>> single version?
> 
> Devuan is a thin skin over Debian, when I ask about this sort of thing on the
> #devuan libra.chat channel they point me at
> https://packages.debian.org/search?keywords=bash and similar.

Debian still has bug reports on their bash page from 2005; how am I
supposed to take that seriously?

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=335642

Chet

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
		 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU    chet at case.edu    http://tiswww.cwru.edu/~chet/



More information about the Toybox mailing list