[Toybox] [PATCH] sh: pass "\" to the later app

Rob Landley rob at landley.net
Sun Jun 4 22:04:53 PDT 2023


On 6/1/23 10:20, Chet Ramey wrote:
> On 5/29/23 12:39 PM, Rob Landley wrote:
> 
>> But I'm still left with this divergence:
>> 
>>    $ ./sh -c 'echo abc\'
>>    abc
>>    $ bash -c 'echo abc\'
>>    abc\
> 
> The backslash doesn't escape anything, EOF delimits the token and command,
> and the backslash remains in place for echo to process (or not).

To me this is all part of line continuation logic. My tokenizer is returning
"needs another line to continue" as part of quote processing, and backslash is
basically a single character quote, which yours is doing too:

  $ echo \  | wc -c
  2
  $ echo | wc -c
  1

But escaping a _newline_ is funny in that it glues lines together instead of
creating a command line argument out of the result, which means it has to be
special cased and obviously I'm special casing it wrong, but the special case
has multiple nonobvious features.

I think part of it is that my tokenizer removes whitespace between tokens, and
you're not doing that until later? (You're doing more passes over the data than
I am, my code tries to do all the work each pass can do so it's not repeating
itself. I had a problem that variable expansion and redirect are the same pass
in my code, and different passes in yours, which leads to me being unable to
produce quite the same error messages you do in a couple places...)

In general, line continuation priority isn't always obvious to me until I've
determined it experimentally:

  $ cat << EOF; if true
  > hello
  > EOF
  > then echo also; fi
  hello
  also
  $ if cat << EOF
  > hello
  > EOF
  > then echo true; fi
  hello
  true
  $ if true; then cat << EOF
  > hello
  > EOF
  > echo next
  > fi
  hello
  next

I'm trying to have tests for everything, but there are a number of corner cases...

>> Which is annoyingly magic because:
>> 
>>    $ bash << 'EOF'
>>    > echo abc\
>>    > EOF
>>    abc
> 
> So think about this in two pieces: what the here-document does to generate
> the input to the shell, and what the shell does with it.

The way I'd done it is the HERE document doesn't generate input, the funky
redirect _requests_ additional input, which is all basically the line
continuation logic where it can't proceed to the "can we actually run this now"
logic because it hasn't yet got a complete thought. I keep keep calling
parse_line() with the next line of input until it returns zero, at which point
it can call run_line() on the accumulated data structure it got parsed into.

> Since the here-document delimiter is quoted, the `outer' shell doesn't do
> anything special with the backslash-newline. If it were not quoted, the
> backslash-newline would be removed, and the EOF would not delimit the
> here-document.

Indeed. I need to make sure I have a test for that in tests/sh.test...

> So the shell is supplied input on file descriptor 0 that consists of a
> single line (which ends with a newline):
> 
> echo abc\

That was the intent, yes.

> which the shell reads. Since nothing is quoted, the backslash-newline gets
> removed, the shell reads EOF and delimits the token and command, and echo
> gets "abc" as its argument.

I thought that "there's a newline at the end of the line, which the \ is
escaping" was relevant, but apparently that's only true for -c.

>> And also:
>> 
>>    $ echo 'echo abc\' > blah
>>    $ cat blah
>>    echo abc\
>>    $ bash ./blah
>>    abc
> 
> Same thing, the file ends with a backslash-newline that gets removed, EOF
> delimits the token and command, echo gets "abc" and does the expected
> thing.

File input and stdin were behaving the same, but -c wasn't. Hence me going "is
it the newline?" later on...

>> So... do I special case -c here or what?
> 
> What's the special case? EOF (or EOS, really) always delimits tokens when
> you're using -c command. Just the same as if you had a file that didn't
> end with a newline.

Except when I have a file that doesn't end with a newline, a trailing \ on the
last line is removed. That was one of the later tests.

>> 
>> Aha!
>> 
>>    $ bash -c $'echo abc\\'
>>    abc\
> 
> There's no difference between this and 'echo abc\'.

Indeed, but it's phrased that way for comparison with the next call. This one
has no newline at the end of the -c input, but is otherwise identical. (Given
how the shell gratuitously strips trailing newlines from "$BLAH" and such, $''
is almost unique in NOT having them stripped...)

Anyway, I'd previously thought -c input wasn't special, in that you can feed
multiple lines into -c and they get parsed as multiple lines:

  $ bash -c $'echo one\necho two'
  one
  two
  $ bash -c $'cat << EOF\nhello\nEOF'
  hello

Which is why in my implementation I'm just feeding them all into
int do_source(char *name, FILE *ff) with calls to fdopen() or fmemopen() when I
want to feed it various types of input.

>>    $ bash -c $'echo abc\\\n'
>>    abc
> 
> The backslash-newline gets removed. That always happens, regardless of
> where the input is coming from.

Yup, which is what led up to the next tests:

>> 
>> So...
>> 
>>    $ echo -n 'echo abc\' | bash
>>    abc
>>    $ echo -n 'echo abc\' > blah
>>    $ bash ./blah
>>    abc
> 
> This looks inconsistent at first glance, I'll take a look.

Which is where I got confused, yes. If -c doesn't end with a newline, then the \
persists, but when stdin or file input don't end with a newline, the trailing
backslash is still removed even when it's the last byte of the input and is thus
has nothing to escape.

>> Nope, that's not it either, -c is still magic even when the file input hasn't
>> got a newline.
> 
> What is `magic' about it?

Input via -c is the only context in which a final \ is retained. Even when it's
the last byte of input, FILE and stdin still strip the trailing backslash.

Rob

(Once again, this is _probably_ me trying to match bash's behavior too closely,
but in the absence of a "bash specification"...)


More information about the Toybox mailing list