[Toybox] [PATCH] sh: pass "\" to the later app

Rob Landley rob at landley.net
Mon Jun 12 14:23:15 PDT 2023


On 6/9/23 15:23, Chet Ramey wrote:
> On 6/8/23 10:31 PM, Rob Landley wrote:
>> On 6/5/23 18:08, Chet Ramey wrote:
> You got me. You're right; I had it backwards.

I'm not trying to gotcha anybody, I'm just trying to understand what the right
thing to implement is. I find this entire area surprisingly confusing...

> "The <backslash> shall retain its special meaning as an escape character

The word pair "shall retain" is not in the bash man page so I'm guessing...
Posix? (Sigh. Part of my complaint about using posix as a shell source is it's
scattered all over the place in utilities/sh.html and utilities/V3_Chap02.html
and they have a list of "special built-in utilities" that does NOT include cd
(that's listed in normal utilities: how would one go about implementing that
outside of the shell, do you think?)

Anyway, I found the third shall retain" in V3_chap02, and... it's wrong?

> (see Escape Character (Backslash)) only when followed by one of the 
> following characters when considered special:
> 
>      $   `   "   \   <newline>"
> 
> So the backslash-newline gets removed, but, say, a \" only has the
> backslash removed.

Because when you put a backslash in front of another char:

  $ echo \x
  x
  $ basename \x
  x

The backslash gets removed anyway, so I don't know what it means by "special"
here. (There's probably a corner case I'm not seeing because it's been too long
since I last read the entire thing from cover to cover and tried to piece
together the choose your own adventure plotlines....)

>> In here documents, double quote does NOT remove it:
> 
> Quoting the here-document delimiter has the expected effect. The body is
> considered to be in double quotes if the delimiter is *not* quoted, and
> basically in single quotes if it is ("the here-document lines are not
> expanded").

Nevermind, it was there in the bash man page when I went back and had another
look. (There's a reason I'm using _that_ as my spec when I can...) I was
confused by expecting "" and '' and \ to work consistently, but I remember now
that they _explicitly_ don't:

  $ cat<<EOF
  > $PATH
  > EOF
  /home/landley/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
  $ cat<<\EOF
  > $PATH
  > EOF
  $PATH
  $ cat<<EOF''
  > $PATH
  > EOF
  $PATH

If the EOF has any removable characters anywhere in it, you don't expand
variables. (I thought I had that working at one point... ah, my strcspn doesn't
have \\ in the string, just \" and '.)

Which does at least make:

  $ cat<<\\
  > $PATH
  > \
  $PATH

Terminable. :)

And my approach of handling HERE document lines one at a time probably came from
"...the lines in the here-document are not expanded. If word is unquoted, all
lines of the here-document are subjected to  parameter  expansion, command
substitution, and arithmetic expansion, the character sequence \<newline> is
ignored, and \ must be used to quote the characters \, $, and `."

Except... \<newline> is ignored when the EOF _is_ quoted? It glues lines
together when it's not quoted? (It's late and I'm not sure I'm reading this
clearly. Need test cases...)

> The next POSIX version goes into a lot more detail on how here-documents
> are read and processed.

Here's hoping spending a more words to explain it will wind up being an
improvement...

>>    $ cat<<"EOF"
>>    > ab\
>>    > c
>>    > EOF
>>    ab\
>>    c
> 
>> I also tried to ask questions about how long a HERE document lasts, and:
> 
> What does `lasts' mean? How the body is delimited, or something else?

Things like continuing past the end of a "source" file and so on. (Data can come
from -c, from stdin, from source, from eval, through $() or <()...)

The colon was an attempt to indicate that examples of what I tried were forthcoming.

>> 
>>    $ bash -c $'cat<<0;echo hello\nabc\n0'
>>    abc
>>    hello
>
> POSIX specifies that "the end of a command_string operand (see sh) shall be
> treated as a <newline> character."

Which says the trailing \ should vanish for -c, but the bug report this all
started with was that it hadn't, and that broke somebody's thing.

>>    $ bash -c $'cat<<"";echo X\n\necho Z'
>>    X
>>    Z
> 
> This is dodgy behavior to rely on: a null delimiter is matched by the next
> blank line, since that's technically "a line containing only the delimiter
> and a <newline>, with no <blank> characters in between."

I'm trying to match what bash does, which means figuring _out_ what bash does. I
respect posix, but I expect to diverge from it a lot because so much of what I'm
trying to be compatible with already does. :(

>>    $ echo -n 'cat<<EOF' > one
>>    $ echo -n $'potato\nEOF' > two
>>    $ bash -c '. one;. two'
>>    one: line 1: warning: here-document at line 1 delimited by end-of-file (wanted
>> `EOF')
>>    two: line 1: potato: command not found
>>    two: line 2: EOF: command not found
> 
> I don't think it's reasonable to expect a word, which is what the here-
> document body is, to persist across `.' boundaries, since the contents of a
> `.' script are (depending on how you parse them) either a `program' or a
> `compound_list'.

I'm basically abusing function contexts, because that's what I attach local
variables to, and $LINENO resets but persists in the same way as local vars:

  $ bash -c $'echo $LINENO;. <(echo echo \\$LINENO);echo $LINENO'
  0
  1
  0

Basically any time I call back into do_source() it stacks a new function
context, but the ones with a NULL pointer for the function name behave slightly
differently so things like:

  $ while true; do x() { . <(echo 'return 3;echo nope;');};x;echo $?; done
  3
  3
  3

can figure out which ones to traverse past.

>> I'm also vaguely curious how one WOULD terminate this one:
>> 
>>    $ bash -c $'cat<<\'\n\''
>>    bash: line 1: warning: here-document at line 1 delimited by end-of-file (wanted `
>>    ')
> 
> You can't. A newline here-document delimiter can never be matched, and
> only EOF will terminate the here-document. Some shells (e.g., yash) treat
> this as a fatal syntax error, but most treat it like bash does. I
> considered printing a warning for a delimiter containing a newline, but
> decided not to.

Oh It's clearly pilot error, I was just curious what would happen.

>> Also, -s doesn't work as advertised in the man page?
>> 
>>         -s        If  the -s option is present, or if no arguments remain after
>>                   option processing, then commands are read from  the  standard
>>                   input.   This  option  allows the positional parameters to be
>>                   set when invoking an interactive shell or when reading  input
>>                   through a pipe.
>> 
>>    $ echo echo also | bash -s -c 'echo hello'
>>    hello
>>    $ echo echo also | bash -c -s 'echo hello'
>>    hello
>>    $ echo echo also | bash -c -s -s -s -s 'echo hello'
>>    hello
> 
> -c has higher priority than -s, and you can only use one. It's unspecified
> behavior; POSIX doesn't allow those options to be used together. Some ash-
> based shells (e.g., dash) execute the command string and then start an
> interactive shell, but I don't think that's a great idea.

It would have made my testing easier, but then again _what_ I'm trying to test
is mostly abusive corner cases. In this case I was trying to find contexts where
I could get multiple lines without \n at the end to interact with each other, so
I could confirm that:

  echo -n $'data\nEOF' | bash -sc $'cat<<EOF'

Could handle all 4 cases of each EOF ending with and without a newline. (But
then I remembered <<EOF is an argument (ala cat <<EOF filename) so it's already
plucked out and NULL terminated, so I just had to worry about the matching line
having a \n or not, which I can already test...)

"How this question got raised" vs "but now that the question's BEEN raised"...

>> Yes, but does a backslash newline count as quoted whitespace?
> 
> No. In places where the backslash acts as escape character, the backslash-
> newline pair is removed from the input stream.
> 
>   Backslash
>> ordinarily quotes, and there's "" which is a quoted nothing but creates an
>> argument. So this is a new category: a quoted nothing that does NOT create an
>> argument. 
> 
> It's removed from the input stream before tokenization. It doesn't even
> delimit a token.

I'm not tokenizing HERE documents (well, I'm about to do a discarded
tokenization pass over them to detect line continuations for variable resolution
but that's just a for loop doing a weird not-strlen() traversal to see if the
last entry returns error or finds the end of line), and quoting context matters
so I'm not sure _how_ you'd remove it from input before tokenization, since
resolving quotes is part of tokenization...?

Again, we may be doing things in a different order...

>>> POSIX says you do them in separate steps.
>> 
>> Good to know.
> 
> It's always said this. (And bash has always performed steps 3 and 4 in
> reverse order, but ...)

Your code's GPLv3 and I'm writing non-GPL code so I've just been looking at your
man page and running tests against the binary. I read posix cover to cover
before starting all this but that's kind of receded into the mists of history at
this point. Mostly I'm reading the bash man page, pondering many years of
writing and editing bash scripts, and doing LOTS of tests...

(I also used to debug things in busybox ash/hush/msh/lash back when I maintained
that and tried to use it in https://landley.net/aboriginal/about.html, and over
the years I've dug into all SORTS of weirdness about "I built a minimal native
development environment out of busybox and string, and now I'm trying to
bootstrap gentoo under it but I'm providing bash 2.95b since it was the
"current" version for multiple years and nobody seemed to mind, but gentoo's
"emerge" doesn't work because they're using the new ~= operator that didn't have
and plus the double quote semantics seem to have changed in a couple places. At
the time I moved from 2.95 to the last GPLv2 release, if I recall...)

I may not know how it works, but I have YEARS of experience breaking it in
strange ways and having to duct-tape around stuff...

>> Alas, posix says a lot of things, it would be nice if more of them were current
>> and relevant. I printed it all out and read the whole thing on a series of bus
>> rides into work when I first sat down to write a new shell for busybox back in
>> 2006. I've had a vague todo to read the new one whenever Issue 8 finally comes
>> out, but it's been "real soon now" for... how long? (Posix-2008 came out 15
>> years ago.)
> 
> The current edition is from 2018.

Except they said 2008 was the last feature release and everying since is
bugfix-only, and nothing is supposed to introduce, deprecate, or significantly
change anything's semantics. That's why it's still "Issue 7". The new stuff is
all queued up for Issue 8, which has been coming soon now since the early Obama
administration.

They SUSv2 in 1997 (https://pubs.opengroup.org/onlinepubs/7990989775/), SUSv3 in
2001 (https://pubs.opengroup.org/onlinepubs/009695399/), SUSv4 in 2008
(https://pubs.opengroup.org/onlinepubs/9699919799.2008edition/) and SUSv5 isn't
out yet. A 4 year gap, a 6 year gap, and now 15 years and counting...

> The next one is in its third draft, then
> it has to go through the whole IEEE process, but it may get through
> balloting by the end of the year. The standard is always evolving (things
> are still being added and deprecated/removed) and being clarified.
> The expanded text describing here-documents is a good example.

Hmmm, I asked why the 2013 update had replaced the 2008 update at the same URL
(all the others had new URLs, as above) and they gave me the typos rationale,
but it's possible they've changed policy since then? (I stopped poking them when
they put up a historical URL where you can get the 2008 version, see above,
which is still technically what I'm trying to target with toybox. I need to do a
whole new https://landley.net/toybox/roadmap.html#susv4 analysis for Issue 8. As
I said, that's when I was planning to do the posix shell language re-read, I
just didn't expect 15 years and counting...)

>>>> In general, line continuation priority isn't always obvious to me until I've
>>>> determined it experimentally:
>>>
>>> You go off and collect here-document bodies as soon as you get a newline
>>> token after seeing the operator-delimiter pair.
>> 
>> It does seem to take priority over everything, yes.
> 
> It was the `where is the next NEWLINE token' and `does a here-document body
> have to appear in the same command substitution as the delimiter' that
> sparked disagreement.

I'm following what the bash in my devuan install does. I should build a newer
one from source, but that's on the Linux From Scratch bootstrapping todo list
which I haven't found focus for since
http://lists.landley.net/pipermail/toybox-landley.net/2023-March/029504.html

>>> We had a pretty good
>>> argument about this on the austin-group list.
>> 
>> I've been subscribed forever, and even dialed in to a few of their conference
>> calls, but I've generally found arguments there mostly just peter out without
>> resolution:
> 
> The only way to guarantee a resolution is to file an interpretation
> request, which has to be acted on. Otherwise, unless we get to some kind
> of consensus on the list, shells keep doing their thing.

As I said, I respect the work the posix guys are doing, but it's not the
standard I'm implementing the shell against. (If Ubuntu hadn't redirected
/bin/sh to dash and broken the entire world in 2006, and talked debian into
copying them, and then refused to put it BACK when their explicitly stated
reason for it (https://wiki.ubuntu.com/DashAsBinSh) failed (then they created
Upstart instead to parallelize the init scripts, and then THAT failed and
saddled the world with systemd), I might have done a posix shell _first_. But
dash left a bad enough taste in my mouth that "a posix-only shell" seems
counterproductive.)

>>    https://mail-archive.com/austin-group-l@opengroup.org/msg09569.html
> 
> This was resolved, and the accepted text is in the link:
> 
> https://austingroupbugs.net/view.php?id=267#c5990

Let's see... a lot more micro-managing of when things are unspecified, carving
out space for the DOS C: drive for some reason...

I still have making "time" be a builtin instead of a separate command on my todo
list. (I could explain why but it's all toybox infrastructure stuff. It's a
similar thing I have to do to add job control to kill, I've worked out how it's
just a lot of work and testing.)

>> I've never been good at the politics side of things...
> 
> Me either.
> 
> [back to here-documents]
> 
>> I've been treating each line as a single word. 
> 
> That's fine as far as it goes, but you eventually have to expand the lines
> and handle line continuations (body not quoted). And you have to handle the
> continuations before you check for the delimiter, so constructs like
> 
> cat <<EOF
> abcde
> next\
> EOF
> 
> don't delimit the here-document, and constructs like
>
> cat <<EOF
> abcde
> EO\
> F
> 
> are commonly accepted, if officially unspecified, even though they make
> ash-based shells fall over dead.

Test cases! Thank you.

That, I can work with.

> I'm trying to make stuff work on
>> nommu systems, which suffer from memory fragmentation very easily, so handling
>> stuff in smaller chunks where possible is an advantage. But yeah, I've gotta
>> handle line continuations in HERE document context. I've got them working ok in
>> other input contexts (it's basically a variant of quoting):
> 
> You might consider just discarding them in the lexer, since they have to be
> removed from the input before you determine tokenization.

That's what I was doing initially, before this whole conversation. But I was
also removing the trailing \n from each line, and ignoring blank lines in places
I shouldn't, and I redid it to where it is now, and it's working.

As long as I've already _got_ escape handling logic parsing all this nonsense,
then handling \<newline> there is low hanging fruit...

>> Although apparently I need a test case for another one of those "$@" silently
>> becomes "$*" things:
>> 
>>    $ bash -c $'cat<<EOF\n"$@"\nEOF' one two three
>>    "two three"
>> 
>> (I think I have a control flag for it already...)
> 
> Since the here-document bodies do not undergo word splitting or quote
> removal, you have to leave the double-quotes there and the positional
> parameters are not split.

And right before my HERE document variable expansion loop, "// TODO: audit this
ala man page".

That's another reason I'm reluctant to start threads with you: it's very easy to
talk shop with somebody who knows MORE about this than I do, but I have a bit of
homework still to do before approaching the teacher about a lot of this stuff. :)

(Although that note was _mostly_ about how I'm setting the flags to ~NO_IFS vs
the list of "don't expand X" in the man page. But it was also a reminder that I
don't have test cases for this part yet, and was doing the <<- leading tab
removal AFTER variable expansion (it only applies to tabs present before
expansion). Also, the code was resolving the variable and then writing out the
unresolved line string which means it hadn't properly been tested even manually
yet, just "finished, it compiled, checked in". (Because "write all these tests"
was left implicit in the TODO note still being there...)

Sorry, this part's not ready for prime time yet. The reason I don't remember
some of these corner cases is I haven't properly TESTED them yet...

> Bash-5.1 switched to using pipes for the here-document if the document size
> is smaller than the pipe buffer size (and hence won't block), keeping the
> temporary file for documents larger than that.

I hate having multiple codepaths to do the same thing without a good reason.

Sometimes (like tail having a seek-count-backwards path and a read-from-start
path) it's unavoidable, because doing a read-from-start through a multi-gigabyte
log is too much. But doing pipes here seems like a microoptimization?

> That caused a rather large blowup, especially with people who assumed
> that here-document bodies would always be seekable, even though POSIX
> explicitly warns against making that assumption:
> 
> https://lists.gnu.org/archive/html/bug-bash/2022-04/msg00051.html
> 
> This was after people got up in arms about bash using temp files for here-
> documents and here-strings in the first place:
> 
> https://lists.gnu.org/archive/html/bug-bash/2019-03/msg00073.html

Of course. :)

Back in 2017 I wired up CONFIG_DEVTMPFS_MOUNT to initramfs in the linux kernel,
so when you requested that /dev have devtmpfs automatically mounted on it by the
kernel, it was in initramfs as well. Without this, the kernel's init/main.c
would try to open("/dev/console") for stdin/stdout/stderr of PID 1, and if it
didn't exist those would be closed, so PID 1 had to mount devtmpfs and open them
itself blindly. (The expectation was you would include a /dev and /dev/console
in the cpio.gz used to populate initramfs, but if you created it as a normal
user with cpio you couldn't include device nodes...)

Anyway, my patch broke debian. Because debian's initramfs boot script was doing
"if ! mount -t devtmpfs /dev /dev; then mount -t tmpfs /dev /dev; fi" meaning if
they inherited a working /dev, their error path would break it. This was 100%
clearly debian's bug, but my kernel patch was still rejected because it broke
working code.

  https://lkml.iu.edu/hypermail/linux/kernel/1705.2/06838.html

  https://landley.net/notes-2017.html#14-09-2017

*shrug* Politics. (I added code to my patch so if you tried to mount devtmpfs on
top of itself it returned "success" instead of failure, and the kernel guys went
"ew, no, debian should fix its bug". Damned if you do...)

>> Alas, you have to generate the contents at command execution time 
> 
> Quite true.
> 
>> because
>> variables resolved in it can change in a loop and/or function call, which is why
>> I need to retain the list of input lines. (Which for me is an array of arrays of
>> arguments because you can have an unlimited number of HERE documents attached to
>> each command, and another batch at the end of each flow control block...)
> 
> It's technically a list of redirections, and you can indeed have multiple
> redirections associated with a command. Bash stores the document as a
> single word (string), since it's going to be treated as a word.

I kept the granularity smaller in hopes of being nice to nommu systems.

(One of the reasons it took me so long to really get going on toysh is I wanted
to figure out how to mmap() input files and use _that_ memory instead of making
copies of everything into the heap. There just wasn't a way it was ever even
CLOSE to a net win.)

>>> I'm saying that the behavior should be consistent whether the shell is
>>> processing -c command or not. I think we agree on that.
>> 
>> Agreed.
>> 
>>> That behavior should be: if there is an unquoted backslash-newline pair,
>>> it should be removed.
>> 
>> Single or double quote?
> 
> Single quotes: preserved. Double quotes: removed when special. For
> instance, the double quotes around a command substitution don't make the
> characters in the command substitution quoted.

Quotes around $() retain whitespace that would otherwise get IFS'd. And command
substitution quoting contexts NEST:

  echo -n "$(echo "hello $(eval $'echo -\\\ne \'world\\n \'')")"
  hello world

When I can't puzzle through it I just run lots of tests against all the corner
cases I can think of and try to retcon a general rule from the results...

> That's the `special' part.
> There's also the case of double quotes around the `new' word expansions
> 
> ${parameter[#]#word}
> ${parameter[%]%word}

This part I don't know about, it looks like that's the prefix/suffix removal syntax?

I implemented a LOT of weirdness and have a lot of tests for it:

  $ for i in $EMPTY; do echo ="$i"=; done
  $ for i in ""$EMPTY; do echo ="$i"=; done
  ==
  $ for i in $EMPTY""; do echo ="$i"=; done
  ==

And the results are the same for EMPTY=" " because IFS. (You know all this, but
it took me over a month to get it <strike>right</strike> consistent with what
bash was doing.)

I note that I have yet to open the can of worms that is bash array variables,
although I've reserved plumbing for them in like five different places. (This is
mostly because I have not historically used them much, and thus don't have a
good handle on how to test it. But multiple people have said that's the biggest
feature they're looking forward to...)

(And "$@" is kind of array variable-ish already...)

I remember being deeply confused by ${X at Q} when I was first trying to implement
it, but it seems to have switched to a much cleaner $'' syntax since? The new ls
--quoting-style=shell-escape is reminiscent of whichever one I was struggling
with (it was a while ago, I'd have to dig through my blog to find it) with
multiple gratuitous context shifts for no obvious reason...

>>> If there isn't, a trailing backslash before EOF
>>> should be preserved. Different shells have different behaviors, and
>>> different versions of echo have different bugs with backslash processing,
>>> but I think this is correct.
>> 
>> Echo isn't processing any of these backslashes. Both bash and toybox echo need
>> -e to care about backslashes in their arguments. (Again, posix-2008 says
>> "implementations shall not support any options", which seems widely ignored.)
> 
> They're not options, per se, according to POSIX. It handles -n as an
> initial operand that results in implementation-defined behavior. The next
> edition extends that treatment to -e/-E.

An "initial operand", not an argument.

Right. So they're going from "wrong" to "wrong" then:

  $ echo -n -e 'hey\nthere'
  hey
  there$

Sigh. This is a common enough pattern that in toybox's lib/args.c plumbing the ^
initial control character is documented at "stop at first non-option argument",
and it's used in:

  $ sed -n 's/.*TOY(\(.*\), *"[^a-zA-Z]*^.*/\1/p' toys/*/*.c | xargs
  runcon su ifconfig netcat chroot chrt ionice unshare oneit openvt setsid
  sysctl taskset timeout watch getopt exec sh strace tcpsvd tr basename echo env
  find nice nohup printf time xargs

Maybe posix should eventually break down and admit this is a thing? "ls . -l"
has to work, but "ssh user at server -t ls -l" really really REALLY needs that
second -l going to ls not ssh. And yes, my echo parses initial -- the same way
every other command that parses any arguments does, and yes this might break
some people's scripts but the "^?Een[-eE]" optstr also has ? which means "pass
through unknown arguments" so:

  $ toybox echo -nx
  -nx
  $ toybox echo --hello--
  --hello--
  $ toybox echo -- hello
  hello

Which is the best I could figure out how to do. (And yes, -n is retrocatively
unrecognized! And I have a test for it.) Is it more important for the toybox
commands to be consistent with each OTHER, or for them to be consistent with
other implementations? Navigating conflicting wossnames. It's a thing.

I've dug into a bunch of posix corner cases over the years and ranted about them
on my blog (ala https://landley.net/notes-2021.html#08-02-2021) but am largely
in "that is certainly a point of view" territory when it comes to what posix
says about the shell. It's good to know! But I'm trying to implement whatever it
is bash is doing.

I have a pre-1.0 release cleanup goal to document my known deviations from
posix. Not CHANGE anything, just... document. I have a few already like
https://github.com/landley/toybox/blob/master/toys/posix/sed.c#L16 which I need
to go back over and confirm are complete...

For the shell though, I plan to document my deviations from BASH. Because that's
my standard.

> Other shells have versions of echo that perform backslash expansion
> unconditionally, as POSIX (XSI) requires. They have various bugs or quirks.

Indeed.

> When bash is in posix mode and has the xpg_echo option enabled, it behaves
> as POSIX specifies for XSI implementations, so it's more than theoretical.
> I have to confess, though, that the only time I've ever run bash that way
> was to run the Open Group test suite.

I have not committed to implementing 100% of what bash does. It's beyond 80/20,
but whether it's 2 iterations of 8/20 (96%) or 3 (99.2%) I dunno yet. Somewhere
between, probably... (https://landley.net/notes-2021.html#30-09-2021 was
promising 3 but it's not an exact science.)

Implementing -p mode might be in the "two iterations" part. Actually passing the
posix test suite (where does one even GET that? Do you have to pay for it?) is
probably at _least_ three...

>>>> Except when I have a file that doesn't end with a newline, a trailing \ on the
>>>> last line is removed. That was one of the later tests.
>>>
>>> Yeah, I think that's wrong. If bash does it, bash is wrong, too.
>> 
>> I pine for a complete, reliable, and current standards document.
> 
> This is completely unspecified behavior.

The standard is not complete, yes.

> POSIX shell scripts are text
> files, which consist of lines, and lines end with newlines. It's up to each
> shell implementor to decide how to handle it. You can push for an extension
> to that, but I would not hold my breath.

I don't care what posix says, I care what bash does.

Well, really I care what the vast mass of bash scripts in the wild depend on,
but I don't have access to the entire global corpus or the set of environments
to run it in or the expertise to evaluate the results, and I know from
experience some of it's got version-dependent behavior anyway, so I'm just doing
the best I can and waiting for people to complain.

(People are already complaining, and the command is still in the "pending"
directory. That's how this thread started...)
> The old "end the command substitution with `echo .' and remove one
> character from the end of the result" trick works,

Ooh, good trick. I hadn't thought of that.

> but any command
> substitution is always going to remove trailing newlines, quoted or not.
> We had a pretty good argument about that trick on the POSIX list, too, but
> most of the objections are theoretical.

I'm not trying to make it work on solaris or AIX. And only a subset works on BSD
or MacOS...

>>> There's genuine disagreement between shells here. The ash-based shells
>>> (dash, the BSD sh, gwsh) preserve the backslash. Bash through bash-5.2,
>>> yash, mksh, ksh93 all remove it.
>> 
>> See "pining for standards", above.
> 
> File an interpretation request. I'm going to do what I think makes sense.
> Be prepared to have it rejected, though.

I'm treating bash as my standard here, not posix.

I would like there to BE a standard, but do not believe Posix can ever become it
due to historical choices from back when FIPS-151-2 required Posix compliance as
a condition of obtaining things like navy contracts, so Microsoft and IBM packed
the committee and got OS/360 and Windows NT declared posix compliant.

I've engaged with the posix guys when there was obvious missing info, ala:

https://landley.net/notes-2014.html#02-12-2014

But one of my big pet peeves about them is they haven't got an actual stable web
archive so discussions that happen on the mailing list are NOT A RELIABLE RECORD
of anything. I keep wanting to refer back to them and the list archive has
changed its formatting (or timed out old data!) and the old URL I saved is 404.

>> (The downside to using bash as a standard is when I ask you about corner cases,
>> half the time you fix things. Not a downside for YOU, but I'm left with a moving
>> target. https://threeplusone.com/quotes/pratchett/ .)
> 
> We talked about this before. Pick a fixed target (e.g., bash-5.1) and write
> to that, then move forward if you like.

I did, it was bash-2.05b and I had to move forward to run "emerge".

This isn't nearly as bad as gnu ls deciding to change its default output format
TWICE since I implemented toybox ls. (First it was -q then it was -b, and now
it's --quoting-style=shell-escape which doesn't even HAVE a short option.)

Posix is a moving target. What debian does is a moving target. I get pings from
users all the time. The compilers change, the kernel changes... I expect a
certain amount of this. It's a red queen's race and I'm just perpetually behind.
Oh well. Used to it.

>>>> Which is where I got confused, yes. If -c doesn't end with a newline, then the \
>>>> persists, but when stdin or file input don't end with a newline, the trailing
>>>> backslash is still removed even when it's the last byte of the input and is thus
>>>> has nothing to escape.
>>>
>>> Yes, you've convinced me this is a bug.
>>>
>>> Maybe it's worth an austin-group interpretation request,
>> 
>> Paging Edvard Munch, please report to the ADR booth.
> 
> You have only yourself to blame. ;-)

I am aware of this.

>> I can make it correct, or I can make it work. I'm not always good enough to do both.
> 
> "Make it work, make it work right, make it work fast. In that order."

Except I'm trying to make it _simple_.

  https://landley.net/toybox/design.html#goals

(I didn't say i was succeeding, but I'm juggling FOUR balls...)

> Chet

Rob

P.S. I'd really hoped I could get a reasonable shell in 3500 lines. The version
I checked in today is 4762 lines. Not exactly dunning-kruger, but there's always
a certain amount of learning on the job. If I knew what I was doing I'd be done.


More information about the Toybox mailing list