[Toybox] [PATCH] sh: pass "\" to the later app

enh enh at google.com
Fri Jul 7 12:29:25 PDT 2023


On Thu, Jul 6, 2023 at 6:09 PM Chet Ramey <chet.ramey at case.edu> wrote:

> On 7/5/23 3:29 AM, Rob Landley wrote:
>
>
> >>>> It's really a useless concept, by the way.
> >>>
> >>> It's not that simple: kill has to be built-in or it can't interface
> with job
> >>> control...
> >>
> >> That's not what a special builtin is. `kill' is a `regular builtin'
> anyway.
> >
> > I started down the "rereading that mess" path and it's turning into
> "reading all
> > the posix shell stuff" which is not getting bugs fixed. And once again,
> this is
> > a BAD STANDARD. Or at least badly organized. There's three groups here:
>
> OK. This is a decision that was made, what, 45 years ago? These are the
> Bourne shell special builtins -- at least as of SVR4. Korn added a couple,
> but since the Bourne shell didn't have them, they were not added to the
> list.
>
> Special builtins will exit a non-interactive shell on an error, assignments
> preceding them persist, and they're found before shell functions in the
> command search order. That's pretty much it. It's not that the builtins
> have to be implemented interally, but that these have other properties.
>
> They're a POSIX concept, so bash conforms when in posix mode. In default
> mode, every builtin is treated the same.
>
> > 1) flow control commands: break, continue, dot, eval, exec, exit, trap,
> return.
> >
> > 2) variable manipulation commands: export, readonly, set, shift, unset.
> >
> > 3) random crap: colon, times.
> >
> > Why group 1 doesn't include "wait" I dunno.
>
> It's not a Bourne shell special builtin: errors in it don't exit the shell.
>
>
>   Why group 2 has set but not read or
> > alias/unalias in it I couldn't tell you,
>
> read isn't a Bourne shell special builtin; errors in it don't exit the
> shell. The SVR4 shell doesn't have aliases (and aliases were originally
> optional in POSIX, part of the UPE).
>
> and for that matter cd is defined to
> > set $PWD.
>
> cd is a weird one. The v7 Bourne shell exited the shell if the directory
> argument didn't exist, and that didn't change until SVR4.2,


and people complain unix isn't user-friendly... :-)


> but POSIX
> declined to make it a special builtin.
>
>   Distinguishing : from true seems deeply silly
>
> true wasn't a special builtin in the Bourne shell.
>
> (especially when [ and
> > test aren't)
>
> Not part of the Bourne shell, only came in in System III, never a special
> builtin.
>
> > and "times" is job control
>
> It's not. It's a straightforward interface to the `times' library function
> (originally system call in 7th edition).
>
> (it's smells like a jobs flag, but
> > they're not including bg/fg here either which are basically flow control
> group 1
> > above).
>
> Job control wasn't included until the SVR4.2 sh, and it was optional in
> POSIX for a long time.
>
> >
> > And having "command" _not_ be special is just silly:
> >
> >    $ command() { echo hello; }
> >    $ command ls -l
> >    hello
>
> It really can't be; one of the uses for command is to suppress the effects
> of special builtins, so they won't exit the shell on error.
>
> >
> > There's only a few more commands like hash that CAN'T be implemented as
> child
> > processes, but they don't bother to distinguish them.
>
> It's not the difference between special builtins and external commands,
> it's the difference between regular builtins and special builtins.
>
> > I know there's the "this
> > may syntax error and exit the shell" distinction but don't ask me how
> set or
> > true are supposed to do that.
>
> set exits the shell on an invalid option and was special in the Bourne
> shell; true isn't a special builtin.
>
> (I _think_ they added set here because set -u can
> > cause a shell error later? Maybe? But then why unset?
>
> Well, unset didn't exist in the 7th edition shell, but it's special
> in the SVR4 shell. It can only fail if asked to unset a readonly
> variable or one of the shell's non-unsettable variables. It takes no
> options, does no argument checking for invalid identifiers, and unsetting
> a variable that's not set isn't an error, but when asked to unset a
> variable the shell says you can't, the shell exits.
>
> I think POSIX made unset a special builtin because the SVR4 sh did and
> so it would be found in the command search before a function. That gets
> important when you're trying to write a secure script,


did the v7 bourne shell just not know whether it was interactive or not?
(because, yeah, this kind of thing makes a lot more sense as an early `set
-e`. but i can't imagine using this interactively!)


> especially when you
> can inherit functions from the environment (bash) or run a startup file for
> non-interactive shells.
>
> It doesn't seem to affect
> > flow control:
> >
> >    $ readonly potato=x; for i in one two three; do echo $i; unset
> potato; done
> >    one
> >    bash: unset: potato: cannot unset: readonly variable
> >    two
> >    bash: unset: potato: cannot unset: readonly variable
> >    three
> >    bash: unset: potato: cannot unset: readonly variable
>
> If you were in posix mode it would exit the shell.
>
> > I guess it's just the sh -c 'a=b set d=e; echo $a' nonsense which only
> dash
> > seems to bother with, which is a good reason _not_ to do it if you ask
> me...
>
> Everyone does it. Bash does it in posix mode.
>
> >
> > In general, And this whole "can exit on error thing" doesn't seem hugely
> honored
> > even when posix says (implies) you can:
> >
> >    $ declare -i potato=1/0
> >    bash: declare: 1/0: division by 0 (error token is "0")
> >    $ declare -i potato
> >    $ set potato=1/0
> >    $ echo $potato
> >
>
> I guess I don't understand these examples. declare isn't a special builtin,
> and there's nothing wrong with setting $1 == "potato=1/0" even though set
> is a special builtin.
>
> >    $
> >    $ (set -x; echo hello ) 2>/dev/full
> >    hello
>
> echo isn't a special builtin, either. But what if it were? It would exit
> the subshell either way.
>
> > Oh, by the way, I remember setting LINENO read only made the shell quite
> chatty,
> > but when I tested it just now it was ignored instead?
> >
> >    $ readonly LINENO
> >    $ echo $LINENO
> >    2
> >    $ echo $LINENO
> >    3
> >    $ declare -p LINENO
> >    declare -ir LINENO="4"
> >    $ echo $LINENO
>
> It would error if you actually tried to assign it a value, and, in posix
> mode, exit the shell.
>
> >>> I
> >>> remember I did make "continue&" work, but don't remember why...)
> >>
> >> Why would that not work? It's just a no-op; no semantic meaning.
> >
> > Not _quite_ a NOP:
>
> I mean, it creates a child process which immediately exits, but it has
> no effect on the shell other than to note that we created an asynchronous
> child process (which sets $!) that exited. It certainly doesn't affect
> flow control.
>
> >
> >    $ for i in one two three; do echo a=$i; continue& b=$i; done
> >    a=one
> >    [1] 30698
> >    a=two
> >    [2] 30699
> >    a=three
> >    [3] 30700
> >    [1]   Done                    continue
> >    [2]-  Done                    continue
> >
> > Notice the child processes and lack of b= lines.
>
> Why would you expect a b= line? Even if the `continue&' were not there,
> the `;' after the first echo command makes the b= line a separate simple
> command. Who's going to echo `b=$i' and why would they? Maybe if you had
> an `echo' in there instead.
>
> > No, if you want a NOP, put a flow control statement in a pipe:
> >
> >    $ for i in one two three; do echo a=$i; continue | echo b=$i; echo
> c=$i; done
>
> These are not equivalent commands.
>
>
> > Backslash in double quote context leaves most characters alone but eats
> \ $ and
> > newline, and unquoted HERE documents are in double quote context.
>
> Yes, but:
>
> >
> >    $ cat<<EOF
> >    > \a \b \c \$ \\ \d \
> >    > EOF
> >    > EOF
> >    \a \b \c $ \ \d EOF
> >
> > As far as I can tell, it's NOT more than \$ \\ and \<newline> that get
> special
> > treatment in this context?
>
> Plus double quote (in double quotes, but not here-documents) and
> backquote.
>
> >
> >    $ cat<<EOF
> >    > <(echo hello)
> >    > EOF
> >    <(echo hello)
> >    $ cat<<EOF
> >    > <(echo $(echo potato))
> >    > EOF
> >    <(echo potato)
> >
> > Yup, just the $ ones and those three \ ones?
>
> Backslash escapes five characters in double quotes, four in here-documents
> with unquoted delimiters.
>
>
> > And it's the short-circuit logic again:
> >
> >    $ echo $((1?2:(1/0)))
> >    2
> >    $ echo $((1&&(1/0)))
> >    bash: 1&&(1/0): division by 0 (error token is "0)")
> >    $ echo $((1||(1/0)))
> >    1
>
> That's not the same thing; arithmetic expression evaluation follows the
> C rules for suppressing evaluation.
>
> >
> > I hadn't put an "echo" in there, but I'd noticed that \" is already not
> removed
> > in HERE context. I'd _forgotten_ that it is in "abc" context.
>
> Right.
>
> > I have a vague todo item for that, but the problem is my data structures
> don't
> > recurse like that so I don't have a good place to stick the parsed
> blockstack
> > for $() and <() and so on, but it just seems wasteful to re-parse it
> multiple
> > times and discard it?
>
> It kind of is, but you need to keep the text around for something like
>
> cat <<$(a)
> x
> $(a)
>
> which POSIX says has to work. ash-derived shells like dash fall over dead
> when presented with that, but it's rare enough that I guess it's a win.
> Even bash isn't perfect about reconstituting the text.
>
>
> > Yeah yeah, premature optimization. I'm fiddling with this stuff a bit
> anyway for
> > function definitions, but when you define a function inside a function
> my code
> > has a pass that goes back and chops the inner functions out and puts
> them in a
> > reference counted list and replaces them with a reference:
> >
> >    $ x() { y() { echo why; }; echo ecks; unset -f x; y; }; x; y; x
> >    ecks
> >    why
> >    why
> >    bash: x: command not found
> >
> > I don't THINK I can do a local function, it's a global function
> namespace, they
> > outlive the text block that defined them, and you can still be running a
> > function that is no longer defined, so... reference counting. :P
>
> Reference counting is ok. Bash just copies the parsed function body (x in
> this case) and executes that, then frees it. That way you can let the
> function get unset and not worry about it.
>
> > But still, the pipeline list itself isn't what's nesting there. I think.
> And
> > given that arguments can be "abc$(potato)xyz" with the nested thingy in
> the
> > middle of arbitrary other nonsense, deferring dealing with that until
> variable
> > resolution time and then just feeding the string between the two () to
> > do_source() made sense at the time...
>
> You have to parse it to find the end of the command substitution, bottom
> line. You can't get it right otherwise.
>
>
> >>>>>> The current edition is from 2018.
> >>>>>
> >>>>> Except they said 2008 was the last feature release and everying
> since is
> >>>>> bugfix-only, and nothing is supposed to introduce, deprecate, or
> significantly
> >>>>> change anything's semantics.
> >>
> >> When, by the way?
> >
> > When did they say this? Sometime after the 2013 update went up, before
> the 2018
> > update went up. It was on the mailing list, but...
>
> I don't remember seeing that is all.
>
>
> >>> The project isn't dead, but those are defined as bugfix releases.
> Adding new
> >>> libc functions or command line options, or "can you please put cpio
> and tar back
> >>> in the command list", are out of scope for them.
>
> cpio and tar were two of those incompatible-never-the-twain-shall-meet
> things, so we have pax (and peace too, I guess).
>
>
> > It was nice when posix noticed that glibc'd had dprintf() for years, it
> was nice
> > when they noticed linux had openat() and friends, but it was never a
> leading
> > indicator.
>
> They don't go out and look for this stuff. Someone has to write a proposal
> in the proper format and shepherd it through. Look at how long it took
> for $'string'.
>
> > When they removed "tar" and "cpio", Linux didn't. (Initramfs is cpio.
> > RPM is cpio.) Nobody installs "pax" by default.
>
> $ type -a pax
> pax is /bin/pax
>
> If you want to pass a certification test, you do.
>

(which might explain why i see it on my mac but not on any of my debian or
ubuntu boxes, or raspberry pis.)


> > Document, not legislate...
>
> Except back in 1990 where the tar folks and the cpio folks both politely
> told each other to pound sand, and that they'd never approve the rival
> format and utility, and POSIX had to do something.
>
> >
> >>> Ken or Dennis having a reason means a
> >>> lot to me because those guys were really smart. The Programmers
> Workbench guys,
> >>> not so much. "Bill Joy decided" is a coin flip at best...
> >>
> >> They all had different, even competing, requirements and goals. Mashey
> and
> >> the PWB guys were developing for a completely different user base than
> the
> >> original room 127 group, and Joy and the BSD guys had different hardware
> >> *and* users, and then the ARPA community for 4.2 BSD.
> >>
> >> Maybe things would be slightly different if Reiser's VM system (the one
> Rob
> >> Pike raves about) had been in 32/V and then eventually made it back to
> >> Research in time for 8th edition, but that's not the way it worked out.
> >
> > The Apple vs Franklin decision extended copyright to cover binaries in
> 1983,
> > clearing the way for AT&T to try to commercialize the hell out of System
> III/V
>
> I think the 1982 decision that allowed at&t to get into the computer and
> software business after giving up its telephony monopoly had more to do
> with it, but that certainly helped at&t.
>
> After that, at&t and its "consider it standard!" campaign eventually did
> the job.
>
> > But I still think the main stake to the heart was the Bell Labs guys
> getting put
> > back in their bottle by AT&T legal, meaning nobody ever saw the Labs'
> Unix
> > Release 8-10, or got to look at Plan 9 before Y2K.
>
> They weren't really interested in writing software for commercial use, and
> at&t was very interested in commercializing Unix.
>
> > The result of $(blah) and $BLAH are handled the same there? Quotes
> _inside_ the
> > subshell are in their own context.
>
> Yes, that's the point I was trying to make.
>
> > Hmmm... Smells a bit like indexed arrays are just associative arrays
> with an
> > integer key type, but I guess common usage leans towards a span-based
> > representation?
>
> It depends on whether or not you want to support very large arrays. The
> bash implementation has no trouble with
>
> a=( [0x1000000]=$'\371\200\200\200\200'
> [0x1000001]=$'\371\200\200\200\201'
> [0x1000002]=$'\371\200\200\200\202' [0x1000003]=$'\371\200\200\200\203'
> [0x1000004]=$'\371\200\200\200\204' )
>
> Which will eat huge amounts of memory if you use a C-type array. Bash uses
> a doubly-linked list with some saved indices to make sequential access
> very fast.
>
> >> You just have to be
> >> really disciplined about how you treat this `exists but unset' state.
> >
> >    $ export WALRUS=42; x() { local WALRUS=potato; unset WALRUS;
> WALRUS=abc;
> >    > echo $WALRUS; env | grep WALRUS;}; x
> >    abc
> >    WALRUS=42
> >
> > Ok, now I'm even more confused. It's exporting inaccessable values? (I
> know that
> > you can export a local, which goes away when the function returns...)
>
> Creating a local variable, which does not inherit the attributes from any
> global variable, does not cause the environment to be recreated.
>
> >>> Anyway, that leaves VAR_ARRAY, and VAR_DICT (for associative arrays).
> I take it
> >>> a sparse array is NOT a dict? (Are all VAR_ARRAY sparse...?)
> >>
> >> The implementation doesn't matter. You have indexed arrays, where the
> >> subscript is an arithmetic expression, and associative arrays, where the
> >> subscript is an arbitrary string. You can make them all hash tables, if
> >> you want, or linked lists, or whatever. You can even make them C arrays,
> >> but that will really kill your associative array lookup time.
> >
> > Eh, sorted with binary search, but that has its own costs...
>
> Resorting the array (or rebalancing a tree, or whatever) every time you add
> a value? That's more work than is worth it.
>
> > Again, sounds like an indexed array is just an associative array with an
> integer
> > lookup key...
>
> Sure, if you want to look at it that way.
>
> >
> >>> Glancing at my notes for any obvious array todo bits, it's just things
> like "WHY
> >>> does unsetting elements of BASH_ALIASES not remove the corresponding
> alias, does
> >>> this require two representations of the same information?
> >>
> >> There's no good reason, I just haven't ever made that work.
>
> There's no unset hook for dynamic variables.
>
> >>>>> An "initial operand", not an argument.
> >>>>
> >>>> That's the same thing. There are no options to POSIX echo. Everything
> is
> >>>> an operand. If a command has options, POSIX specifies them as
> options, and
> >>>> it doesn't do that for echo.
> >>>
> >>> Hence the side-eye. In general use, echo has arguments. But posix
> insists it
> >>> does not have arguments.
>
> I was never sure what this is supposed to mean. What POSIX calls operands
> are arguments, are they not?
>
> >> What did you think would happen to the unquoted backslash?
> >
> > I meant asking newbies to learn to use printf from the command line
> before echo
> > means they have to quote the argument and add \n on the end as part of
> "simple"
> > usage, which seems a fairly heavy lift.
>
> The sole advantage echo has for a newbie is that it adds the newline.
>
> >>>>> Maybe posix should eventually break down and admit this is a thing?
> "ls . -l"
> >>>>> has to work,
>
> Why does `ls . -l' have to work?
>
> ls . -l
> ls: -l: No such file or directory
> .:
> [directory contents]
>
> If the Linux folks want to reorder arguments so that things that look like
> options come first, then they can do it as an extension.
>
>
> > You asked why do I think posix doesn't acknowledge $THING today. My
> experience
> > with raising issues where posix and common usage seemed to have
> significant
> > daylight between them involved abrasive gatekeeping, resulting in me
> wandering
> > away again and leaving the historical memorial to HP-UX and A/UX and so
> on to
> > its own devices.
> >
> > It's possible my experience was unusual?
>
> Not necessarily; Jorg treated a lot of people that way. But the mistake is
> treating him as a representative of anything but himself or a member of the
> working group.
>
> >>> (Yes, I'm aware of recent changes. That's why I re-engaged with Posix,
> felt I
> >>> owed it to them since the condition under which I said I'd come back
> >>> unexpectedly happened. But having already written them off, my heart
> really
> >>> wasn't in it. I _should_, but I'm juggling too many other balls...)
> >>>
> >>>> Options only exist as
> >>>> such if they come before the first non-option argument.
> >>>
> >>>     $ cat <(echo hello) -E
> >>>     hello$
> >>
> >> Yeah, looks like a bug in cat to me:
> >>
> >> $ cat <(echo hello) -E
> >> hello
> >> cat: -E: No such file or directory
> >>
> >> The GNU utilities do all sorts of argument reordering, but that doesn't
> >> mean you're going to get that in POSIX.
> >
> > See "daylight between what posix says and what reality's been doing for
> > decades", above.
>
> POSIX isn't a "let's rubberstamp what Linux is doing despite what other
> implementations do" kind of group.
>
> > When I see reality does not match posix, I do not automatically conclude
> that
> > reality is wrong.
>
> Your day-to-day computing reality, sure. My day-to-day computing
> environment is different, for example, and in this case, it seems to
> match POSIX.
>
> >
> >>>> Options have to
> >>>> begin with `-'.
> >>>
> >>>     tar tvzf blah.tgz
> >>>     ps ax
> >>>     ar t /usr/lib/libsupp.a
>
> Yep, not posix.
>
> >> That's not inconsistent with the requirement that ssh options appear
> before
> >> other arguments.
> >
> > My point was those are basically the only cases where that requirement
> exists.
> > The rest of them can "rm dir -r" and what posix says about it doesn't
> matter.
>
> Sure, on Linux.
>
> > (And yes I have a TODO item to have wildcards expand to "./-file" as
> necessary...)
>
> Contortions like that are why argument reordering is a bad idea.
>
>
> > There are instances where they've been good, yes. Removing tar was
> "legislate,
> > not document" and they explicitly refused to acknowledge that it was a
> mistake
> > over a decade later.
>
> Refer to my previous comment about pounding sand. The standard would not
> have been approved in 1992 with tar and cpio. There were a lot more
> companies with a stake in it back then.
>
> >
> > The FSF required signed paper copyright assignments to be filed with the
> boston
> > office for decades.
>
> I know.
>
> > The "cathedral" in "The Cathedral And the Bazaar" was the
> > GNU project, as mentioned in the paper's abstract on the 1998 Usenix
> website
> >
> https://www.usenix.org/conference/1998-usenix-annual-technical-conference/software-development-models-cathedral-and-bazaar
>
> Kind of. It was mentioned, and used as an example, but Kirk giving the talk
> with esr kind of biased the Cathedral model towards BSD.
>
> > It's kinda bureaucracy-ish.
>
> As the stakes rise, and the scope grows, processes grow to meet them. The
> culture changes.
>
>
> > I have a whole bunch of blue sky todo items, but my _focus_ is getting A)
> > Android self-hosting,
>
> Yeah, there's a ways to go.
>
> https://lists.gnu.org/archive/html/help-bash/2023-06/msg00117.html
>
> They mess up the simple stuff.
>

define "mess up"... Android deliberately has strict seccomp filters for
apps, and the syscalls mentioned in that post are on the "no" list. Android
gives each _app_ a different uid, so there's typically nothing useful you
can do here anyway. (things are a bit different if you're actually part of
the OS, but bash being GPL makes that unlikely :-( )

(yes, i agree that it's mildly unfortunate that there's no special case for
"i don't actually want to change anything", which i think is the case
they're talking about in that post, and i've wondered about adding that to
libc once or twice, but my feeling is that it wouldn't be particularly
useful in practice because _that_ kind of code probably needs a rethink
anyway when porting to Android.)

but, yeah, "security" and "self-hosting" aren't exactly friends ... the 1%
bad guys being the reason we can't have nice things, as usual. (executing
code off a writable filesystem being frowned upon.)


> > And that was still better than the horrors of gentoo! (I met with Daniel
> Robbins
> > in person a couple times and we tried to get stuff to work, but that was
> after
> > he left gentoo and started funtoo).
>
> Don't start with me about gentoo.
>
> > Eventually the Alpine Linux guys came along and built a distro around
> the work
> > I'd done (after I'd already left it behind, but hey).
>
> Isn't that the default Linux image for Docker?
>
> > Plus make and bash, which can't be external gpl packages _and_ ship in
> the
> > android base image.
>
> Thorsten would be happy for android to keep using mksh, I'm sure.
>

(and i'm going to have a lot of fun dealing with compatibility issues
if/when we move /bin/sh over...)


> >>>> What are you using now?
> >>>
> >>> $ bash --version
> >>> GNU bash, version 5.0.3(1)-release (x86_64-pc-linux-gnu)
> >>
> >> Jesus, your distro can't even be bothered to apply all the patches for a
> >> single version?
> >
> > Devuan is a thin skin over Debian, when I ask about this sort of thing
> on the
> > #devuan libra.chat channel they point me at
> > https://packages.debian.org/search?keywords=bash and similar.
>
> Debian still has bug reports on their bash page from 2005; how am I
> supposed to take that seriously?
>
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=335642
>
> Chet
>
> --
> ``The lyf so short, the craft so long to lerne.'' - Chaucer
>                  ``Ars longa, vita brevis'' - Hippocrates
> Chet Ramey, UTech, CWRU    chet at case.edu    http://tiswww.cwru.edu/~chet/
>
> _______________________________________________
> Toybox mailing list
> Toybox at lists.landley.net
> http://lists.landley.net/listinfo.cgi/toybox-landley.net
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.landley.net/pipermail/toybox-landley.net/attachments/20230707/4c7ef320/attachment-0001.htm>


More information about the Toybox mailing list