[Toybox] Shell corner cases.
Chet Ramey
chet.ramey at case.edu
Thu May 1 18:26:42 PDT 2025
On 5/1/25 6:23 PM, Rob Landley wrote:
>> Not at all. The line is parsed into three commands and executed:
>>
>> 1. The shell function is created by the function definition command.
>>
>> 2. The variable D is given a value by the simple command.
>>
>> 3. The variable D is expanded as part of word expansion and the result
>> is executed as a simple command.
>>
>> I don't see how you get much a similarity here, since there's nothing
>> changed by the parser like in alias expansion.
>
> The implementation is different but what they do is the same. Modulo the
> "wrapping source" thing I mentioned. But I've never seen anybody do that,
> I'm just going "maybe that's why it exists".
What they do is definitely not the same. You can make them appear to have
the same effect, but it's not the same.
> (I've never seen anybody use "alias" as anything but function definition
> with a more convenient syntax, and I've been reading shell scripts since
> 1992.)
People who do things other than that are trying to be too clever by half.
> You can alias "if". I just don't know why you'd WANT to.
Sure. POSIX doesn't let you, btw.
> Ah, yes that's what I was expecting. And it did it. Which means it's not
> _just_ special casing prefix assignments, it's also special casing
> redirects.
Because a simple command consists of words, assignment words, and
redirections. A redirection is not a word, and is not eligible for
alias expansion. If you want to consider that "special casing," go for
it, but don't claim that it's not well-defined.
> However:
>
> $ alias blah='echo hello'
> $ X=
> $ $X blah
> bash: blah: command not found
command 1
> $ func() { echo hello; }
> $ $X func
> hello
command 2
>
> Yeah yeah, blame posix...
No! Don't blame POSIX! This is how Bourne shells work. The first word in
the command I marked as `command 1' up there is $X. That means the second
word (`blah') is not eligible for alias expansion. Even though $X doesn't
expand to anything it's still the first word of a simple command.
In the command I marked as `command 2', we have a simple command that
undergoes word expansion. After word expansion, the only word left is
`func', which is executed. It happens to be a shell function.
It's pretty fundamental that your shell understands the difference.
>> You have agency here, Rob: you don't have to do anything you don't want to.
>> I'm telling you what other shells -- including bash -- do and what POSIX
>> says (most of it's unspecified).
>
> I can see _what_ it's doing, I'm trying to figure out _why_. And am not
> sure I'm any closer than when I started, but again I think this is posix
> and history at fault here...
You have to decide what you want. Do you want a reimplementation of bash,
or do you want something so that you understand every "why"? You get to
make that choice. And the latter is still possible, but you have to put
in the work.
> Sigh, this is preprocessor macros, isn't it? Except it wants to skip prefix
> assignments and redirections and who knows what else that isn't detected
> until the line gets parsed quite a while later.
Oh, ffs. You really should read POSIX, at least
https://pubs.opengroup.org/onlinepubs/9799919799/utilities/V3_chap02.html#tag_19_10_01
> See, the problem is:
>
> $ a=b if true; then echo a=$a; fi
> bash: syntax error near unexpected token `then'
How is that a problem? A reserved word can't be recognized as such unless
it follows an operator or other acceptable token (there is a finite number
of tokens that can prececde a reserved word). It cannot ever appear after
an assignment word; assignment statements cannot precede compound commands.
So the `if' can't be returned to the parser as the IF token; it's just the
first word of a simple command.
Since `then' appears after a token that can precede a reserved word, you
return THEN as a token (or however you represent it). That's a syntax error.
"1. [Command Name]
When the TOKEN is exactly a reserved word, the token identifier for that
reserved word shall result. Otherwise, the token WORD shall be returned.
Also, if the parser is in any state where only a reserved word could be the
next correct token, proceed as above."
> I have to parse keywords to do line continuations and prompt for more
> input, but I can't have prefix assignments before a keyword. But alias can,
> and alias can RESOLVE to a keyword.
You have to restart the lexical token scan after you expand an alias at the
start of a command (you have to rescan starting with the expanded text
anyway; that's how aliases work). That doesn't mean you return the result
to the parser as a word, though it often does.
> It's INVENTING A LAYER, which happens
> EARLIER than that yet does MORE than that,
This sounds like an implementation artifact.
Since the alias expansion happens at the lexical level, the expanded alias
determines what the lexical level returns to the parser:
"When a TOKEN is subject to alias substitution, the value of the alias
shall be processed as if it had been read from the input instead of the
TOKEN, with token recognition (see 2.3 Token Recognition) resuming at the
start of the alias value."
There is a necessary element of rescanning here.
Chet
--
``The lyf so short, the craft so long to lerne.'' - Chaucer
``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU chet at case.edu http://tiswww.cwru.edu/~chet/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature.asc
Type: application/pgp-signature
Size: 203 bytes
Desc: OpenPGP digital signature
URL: <http://lists.landley.net/pipermail/toybox-landley.net/attachments/20250501/c517a544/attachment-0001.sig>
More information about the Toybox
mailing list