[Toybox] And again.

Chet Ramey chet.ramey at case.edu
Wed Sep 2 07:42:47 PDT 2020


On 9/2/20 2:16 AM, Rob Landley wrote:

>>> And I have questions:
>>>
>>> 1) Bash DOES remove quotes from the pattern, it has to because splitting is
>>> disabled so spaces and $IFS can get inserted:
>>
>> It doesn't perform quote removal, and Posix says it should not.
> 
> Define "quote removal"?

The last word expansion is to go back and remove quote characters that were
present in the original word and acting as special (so they were not quoted
themselves). According to the abstract Posix model, they exist in the word
until that step, but are left out of the expansion in some implementation-
defined fashion. They're magic, and the quote removal step acknowledges
they're magic by specifying when they're logically removed.

The problem is that "quoted to the shell for tokenization and expansion" is
not the same thing as "quoted for shell pattern expansion", which allows
only backslash.

> 
>   $ A="a b"; case $A in "a b") echo hello "$A"; esac
>   hello a b
>   $ A="a b"; case $A in a b) echo hello "$A"; esac
>   bash: syntax error near unexpected token `b'
>   $ A="a b"; case "$A" in "a b") echo hello; esac
>   hello
> 
> The quotes between the in and the ) change the behavior in a way that seems an
> awful lot LIKE quotes are being parsed and thus removed?

It's not quite quote removal. If you used characters that were special to
shell pattern matching, you'd see it. Should, say,

A="a+b"
case $A in "a*b") echo hello ;; esac

print anything? If you were performing quote removal on the pattern, it
would. But it's more interesting to use backslashes, since those have
special meaning to both expansion and shell pattern matching.

> I was reading quote removal as "you can use quotes here, and they will be
> understood rather than treated as literals". That seems to be the case...?

Kind of. "Understood" is doing a lot of work there. Quotes are specified to
affect tokenization -- what characters are part of a word, and whether or
not a word is recognized as a reserved word or an operator -- and
later expansion. There has to be way to pass that information between the
phases. In the Posix model, that method is leaving the quotes present in
the word, and removing them later, after all the expansions are performed,
though there are several different existing implementations of that idea.
But that doesn't quite work for pathname expansion or other pattern
matching; there has to be a way to `convert' shell quoting into something
that quotes characters in patterns.

> 
>> What it
>> does do is make sure that the quote characters arrange to quote parts of
>> the pattern appropriately so that special matching characters match
>> themselves. The shell has to remember which parts of the pattern were
>> quoted,
> 
> It has to remember which parts were active (I.E. unquoted/unescaped) and which
> parts weren't active when it hit each wildcard, yes.

Sure, that's a restatement of the premise.

 Which is largely the same
> test condition as IFS splitting being able to occur there, so IFS active and
> wildcard active can share most of their logic.

In terms of internally marking characters as `quoted' or not, yes. But you
have to have that logic for expansion anyway.

> 
> I wrote a collect_wildcards() function that assembles a deck of active wildcard
> locations for the wildcard expansion pass to replace later:

You can use the same technique for word splitting, as I think you implied
you do above.


>> and make sure that those quoted characters get passed to the
>> matcher (which may or may not be fnmatch()) in whatever way the matcher
>> requires. That usually means prefixing them with a backslash, but then
>> you get into what happens with quoted characters inside bracket
>> expressions. The word expansions still happen how they're supposed to.
> 
> Since I can't expect libc to understand +() I'm writing my own glob() function
> which consumes the string and the deck as input. (Which also needs partial match
> support to make a/b*/c*/d work as it traverses down into only specific
> subdirectories...)

glob() and fnmatch() are not quite the same.

> 
>> There was a ferocious argument about this a couple of years ago, and there
>> are still arguments about how to specify quoting in shell pattern matching.
> 
> This is domain expertise I'm missing. I never even bothered to use case/select
> in my shell scripts before this because an if/else staircase works about as
> easily...

If you don't need pattern matching, sure.

> 
>> If you were to perform quote removal on the patterns, you'd need something
>> like
>>
>>   case "$x" in \\*) echo 'literal asterisk' ;; esac
>>
>> to match an asterisk.
> 
> Except that logically says that \\ is a literal backslash, and then * isn't
> escaped and is thus active? (That seems to be performing quote removal _twice_?)

You'd perform quote removal on the pattern, which would leave it as `\*'.
Then you'd pass that pattern to the matcher, and the backslash in the
pattern would quote the asterisk. Quote removal and treating the backslash
as removing the special meaning of the following character are not the same
thing.

> 
>>> 2) process substitution? Really? Under what circumstances does:
>>>
>>>   case <(potato) in $PATTERN) echo hello;; esac
>>>
>>> trigger usefully? 
>>
>> If people want to do dumb shit, people are going to do dumb shit. One
>> could use this to determine whether bash uses /dev/fd or named pipes for
>> process substitution, but you shouldn't really have to care.
> 
> "dumb shit", "you shouldn't really have to care".

You shouldn't really have to care how bash implements process substitution.

> 
> I'll wait for somebody to complain.
> 
>>> My code is treating <() as a form of redirection, so it's handled by
>>> expand_redir() rather than expand_arg_nobrace(), and moving it is problematic
>>> because only one context has the filehandle tracking (I.E. recording what to
>>> close again afterwards).
>>
>> That's probably going to come back and bite you, since process substitution
>> is a word expansion.
> 
> It might, but if you glue anything to the beginning or the end it's not a valid
> filename anymore? (Modulo reaching into somebody else's chroot except we
> dynamically allocated this file out of _our_ host /dev?)

It's not useful, true, but you can do it.

> Other than this, the only other example I can think of is telling the kernel:
> 
>   KCONFIG_ALLCONFIG=<(cat file) make allnoconfig

The reason to use process substitution is to communicate with asynchronous
child processes that create output in a more dynamic fashion. In this case,
there's no difference between that and `KCONFIG_ALLCONFIG=file'. It would
be more appropriate if the process substitution ran something that
generated the config file on the fly from some other specification.

> The $() contents gets
> passed to the "run subshell" logic and the output read into a string, and that
> takes a string to run about the same way "exec" does.)

You probably mean `eval'.

Chet

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
		 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU    chet at case.edu    http://tiswww.cwru.edu/~chet/


More information about the Toybox mailing list