[Toybox] Would someone please explain what bash is doing here?

Tue May 19 13:03:27 PDT 2020

On 5/18/20 10:41 AM, Chet Ramey wrote:
> On 5/17/20 7:11 AM, Rob Landley wrote:
>> I had a reply window open to this when my laptop battery died, and thunderbird
>> doesn't store unfinished messages like kmail and vi and chrome...
>>
>> Anyway, I was reminded of this thread by:
>>
>>   $ IFS=x; ABC=cxd; for i in +($ABC); do echo =$i=; done
>>   =+(c=
>>   =d)=
>>   $ bash -c 'IFS=x; ABC=cxd; for i in +($ABC); do echo =$i=; done'
>>   bash: -c: line 0: syntax error near unexpected token `('
>>   bash: -c: line 0: `IFS=x; ABC=cxd; for i in +($ABC); do echo =$i=; done'
>>   $ readlink -f /proc/$$/exe
>>   /bin/bash
> 
> Yes, you need extglob to get a word like +(xyz) to parse unquoted. Since
> the word list following `in' is subject to pathname expansion, it's valid
> in that context.

But the same command line in the current shell and in bash -c are parsing
different despite presumably reading the same .bashrc and friends?

>> (I tried inserting shopt -o extglob; and shopt +o extglob; at the start of the
>> -c and it didn't change anything?)
> 
> Because extglob isn't one of the `set -o' options. I wanted `shopt' to be
> suitable to set all the shell options, so it understands -o and o and can
> modify the `set -o' option set. You want `shopt -s extglob'.

$ bash -c 'shopt -s extglob; IFS=x; ABC=cxd; for i in +($ABC); do echo =$i=; done'
bash: -c: line 0: syntax error near unexpected token `('
bash: -c: line 0: `shopt -s extglob; IFS=x; ABC=cxd; for i in +($ABC); do echo
=$i=; done'

Nope, that wasn't it either.

>> Generated code exists and has its costs whether or not you had to manually write
>> it. 
> 
> Sure. I'm comfortable with the tradeoff  to this point. There are other
> things I'd rather work on than writing a new parser.

I have, on more than one occasion, been accused of trying to "boil the ocean".

I prefer to think of it as more Sam Vimes' wanting to arrest the gods for doing
it wrong.

  https://www.schlockmercenary.com/2006-07-13

>>>> which said to _me_ that the parsing order of operations is
>>>>
>>>> A) keep parsing lines of data until you do NOT need another line.
>>>>
>>>> B) then do what the lines say to do.
>>>
>>> Roughly, if you mean "complete commands have been resolved with a proper
>>> terminator" and "execute the commands you just parsed."
>>
>> The execution can be deferred arbitrarily long (you may be defining a function)
>> or never (the else case of an if statement that was true), but yeah.
> 
> In a way. A function definition is a compound command with a separate exit
> status, and it's the `if' command we're talking about here, not its
> constituent parts, some of which may indeed never be executed.

I'm just treating everything as a function, and intending to save some of 'em
and run others immediately. But I haven't implemented the function namespace
directory stuff yet. (Or aliases...)

>> Anyway, that structure needs an "int lineno" added that gets snapshot from the
>> global TT.lineno, and what I've learned from all this is it gets snapshot at the
>> end when we close out the sh_pipeline and start the next one, not at the
>> beginning when it's allocated. (That's the observation that makes the behavior
>> make sense now.)
> 
> No. Kind of in the middle. Consider the following:
> 
> echo \
> one \
> two \
> $LINENO \
> done
> 
> Bash will always expand $LINENO to 2 in this construct, since it was on
> line 2 when it figured out it was parsing a simple command.

*blink* *blink.

Ok, that just makes NO sense to me.

>> Last time I looked up youtube clips for the princess bride "once his HEAD is in
>> range HIT IT WITH THE ROCK", and winnie the pooh "bear of very little brain",
>> but I haven't got the spoons to do that again.
> 
> You fell victim to one of the classic blunders.

I have neither gotten involved in a land war in asia nor gone up against a
sicilian when death is on the line (although the second one appears negotiable).

>>> You can probably get away with it as long as that option parsing code stops
>>> at the first word that doesn't begin with `-'.
>>
>> That's literally one character ("^" at the start of the option string in the
>> middle argument of the NEWTOY() macro.)
> 
> You've lost me there. Are you saying that you use ^ to mean something when
> parsing options?

Yeah, I wrote my own parsing plumbing:

  https://github.com/landley/toybox/blob/master/lib/args.c#L74

It gets called automatically before each COMMAND_main() function, so the options
are already parsed and the results waiting before the command-specific C logic
starts running.

It uses an option string to figure out what to do, and uses it to set bits in a
global (toys.optflags) for each option it's seen, plus filling out an array of
arguments for things like -c "char *data". (It relies on the LP64 standard for
long and pointer to be the same size, which is true on linux, macos, and freebsd.)

Each command has metadata at the top describing how it fits into toybox, in this
case:

  https://github.com/landley/toybox/blob/b7196626494b/toys/pending/sh.c

On line 157 the command #defines FOR_sh before it #includes "toys.h" so it gets
the FLAG_i macros for  and such for the bit values of the options, so it can
test. There are also FLAG(x) macros that become (toys.optflags & FLAG_x) as a
slightly smaller way of saying that. :)

On line 58 is the NEWTOY() macro declaring the "sh" command:

  USE_SH(NEWTOY(sh, "(noediting)(noprofile)(norc)sc:i", TOYFLAG_BIN))

The USE_SH() evaluates to its contents when CONFIG_SH is enabled and nothing
when the config symbol is disabled, so that lets the multiplexer create an array
of enabled commands it can dispatch to by name, just with a grep | sort stacking
them all together to make generated/newtoys.h. (The sort is so it can binary
search it; hotpath and all.)

NEWTOY() says we're defining a new command, it has 3 arguments: command name,
option string, and flags.

The option string is the thing that needs ^ added to the start to say "stop at
the first non-option argument". (This one has a single : in it, which means the
GLOBALS struct for sh should start with a "char *c;" to receive that argument,
which will be NULL if there wasn't one this time.)

Anyway, the point is this behavior change is:

  USE_SH(NEWTOY(sh, "^(noediting)(noprofile)(norc)sc:i", TOYFLAG_BIN))

Although what I SHOULD do is:

  USE_SH(NEWTOY, sh, "^(noediting)(noprofile)(norc)sc:^i", TOYFLAG_BIN))

I've only added the first few sh arguments to my implementation so far. Some of
these are MUCH longer. The longest so far is (wordwrap's going to butcher this
but it's all one line):

USE_TAR(NEWTOY(tar,
"&(restrict)(full-time)(no-recursion)(numeric-owner)(no-same-permissions)(overwrite)(exclude)*(mode):(mtime):(group):(owner):(to-command):o(no-same-owner)p(same-permissions)k(keep-old)c(create)|h(dereference)x(extract)|t(list)|v(verbose)J(xz)j(bzip2)z(gzip)S(sparse)O(to-stdout)P(absolute-names)m(touch)X(exclude-from)*T(files-from)*C(directory):f(file):a[!txc][!jzJa]",
TOYFLAG_USR|TOYFLAG_BIN))

Which has a corresponding:

GLOBALS(
  char *f, *C;
  struct arg_list *T, *X;
  char *to_command, *owner, *group, *mtime, *mode;
  struct arg_list *exclude;

  struct double_list *incl, *excl, *seen;
  struct string_list *dirs;
...
)

The bits before the blank line are automatically filled out by option parsing,
the ones after are just this command's global variables (usable as TT.dirs and
such). It's grouped like that to re-use the same memory for multiple commands,
generated/globals.h actually has something like:

extern union global_union {
  struct tar_data {
    STUFF_FOR_TAR
  } tar;
  struct sh__data {
    STUFF_FOR_SH
  } sh;
  ...
} this;

And then there's a "#define TT this.sh" in the #if FOR_sh block of
generated/flags.h.

It's a bit like the difference between an automatic transmission and a stick
shift. USING it is easy. Explaining how it works behind the scenes is...
lengthy. (Not HARD, just... there's a bit of it.)

But it means that simple commands can be just... hmmm, what's the shortest one
that actually includes a globals block... Looks like it's:

  https://github.com/landley/toybox/blob/master/toys/other/fallocate.c

That's the whole command. Add that file to toybox and the command exists, delete
it and the command doesn't exist. Nothing else in toybox has to be told about
that command, the build picks it up with grep and sed on toys/*/*.c and
generates the appropriate headers and build dependencies and targets and so on.

(Well, if it has tests there's a tests/fallocate.test file, but they're
intentionally separate.)

>>>> Documenting this as a deviance from <strike>posix</strike> the bash man page
>>>> seems the better call in this instance. 
>>>
>>> Documenting what as a deviation? POSIX doesn't do long options; you can do
>>> whatever you like with them.
>>
>> My shell standard isn't posix, the standard I'm trying to implement is the bash
>> man page.
>
> Then why recommend that I document it as a deviation from something POSIX
> doesn't standardize?

No, sorry, I'm saying it's something _I_ should document in _my_ source file.
Right after I link to the relevant standards document (if any), I try to note
intentional deviations form that standard, ala:

  https://github.com/landley/toybox/blob/master/toys/posix/ls.c#L8

>>>> These days I handle that sort of thing by waiting for somebody to
>>>> complain. That way I only add missing features somebody somewhere actually _uses_.)
>>>
>>> It has to be a lot more than one person.
>>
>> Yeah, but if I'm on the fence about it to begin with it only takes one person to
>> confirm "yeah, that's actually used".
> 
> Sometimes. I'm not getting paid for any of this;

Neither am I. I'm _supposed_ to be redoing the README of the
https://github.com/j-core/jcore-soc repository we finally just uploaded to github.

My boss Jeff humors me spending some work hours on this stuff because we do use
it, but I just replaced toysh with the old sh[1-6].c uclinux shell (I think it
was a neolithic snapshot of the minix shell?) in the filesystem we deployed on
sunday because I couldn't get mine usable enough in time. The majority of what I
work on for $DAYJOB doesn't involve toybox at all, there's VHDL and kernel stuff
and hardware bringup and documentation and a lot of stuff perilously close to
management because it's the kind of startup where everybody wears many hats for
at least a couple more funding rounds...

In 2013 one of the companies using android gave me a 6 month half-time contract
to work on stuff evenings and weekends, but they made me sign an NDA so I can't
say who it was or what I did for them, and it hasn't been repeated in the 7
years since.

Google/alphabet/android itself has given me the Google Open Source award twice,
each time with a $200 payoneer gift card (I spent the first one on japanese
lessons and an online game, the second needs some website activation thing that
didn't work, I should try again someday). Oh, and one of them came with a lovely
blanket my wife's dog really likes. (Oh, and in response to the June 4, 1999
stock market investment column I wrote for The Motley Fool (it was the dot-com
boom, these things happened) which mentioned them in passing back when they were
a "linux beta" site, they responded to their "first coverage in the financial
press" in by sending me a bag with stickers and a t-shirt. If I recall I had to
give the t-shirt away because I couldn't accept gifts from companies I wrote
about, but I had one of the stickers on my car for a couple years. :)

Beyond that... Elliott bought me lunch once, and Tim Bird of Sony paid my travel
and hotel to ELC one year (but I think that might have been back when I was
doing busybox, not toybox? It was back before https://lwn.net/Articles/478308/
anyway. I'm pretty sure that article is why that company wanted me to sign the
NDA. Pity, I'd happily credit them as an early supporter my work otherwise...)

Oh, and I had the patreon until recently, but that involved people voting on 4
or 5 different things I've done in the past. (Heck, I was thinking I might have
time to work on qcc again if patreon ever provided enough money I didn't have to
spend time on unrelated $DAYJOB stuff to pay my mortgage, and could just focus
on open source. Didn't work out that way. Oh well.)

> if I implement something
> new it has to be something I think is valuable or will pay off in the long
> run, or something lots of people are requesting.
> 
>> Also, Elliott speaks for the Android userbase. They ship a billion devices
>> annually. When he tells me he needs a thing, it carries some weight. (We argue
>> about how and where, but "what" is generally a foregone conclusion.)
> 
> Sure. The Linux distros play the same role for me, though I did get some
> good input from the Solaris guys a while back.

Sun Microsystems' business model was based on large corporate and government
organizations' procurement policies regularly requiring bidders for procurement
contracts to cap their profit at a percentage of the cost of materials. Which
meant instead of speccing a dozen $29.95 boxed copy of Red Hat, they'd INSIST
that only $5000 per seat Solaris installs would do. (Because they'd rather make
$500 than $2.95 off of each one.)

When new management people were brought in to prepare Red Hat for its IPO around
2000, they explained this to existing Red Hat senior management who went "wait,
if we charge way way more our volume will go UP?" And so they came out with an
experiment called "Red Hat Enterprise", which IMMEDIATELY sucked away Solaris'
entire business (because on a technical level everybody wanted to use Linux
instead of Solaris, they just didn't want to leave money on the table), and
suddenly all the Navy contracts were Red Hat Enterprise and Red Hat's $100
million/year Enerprise business tail was wagging the $20 million/year retail
boxed sales dog. And aaaaaaall the Red Hat engineering employees got SUCKED out
of doing traditional Red Hat into doing Red Hat Enterprise, and they became
Pointy Haired Linux, and of course I already blogged about it and can stop now:

  https://landley.net/notes-2012.html#14-03-2012

But on the Oratroll side, they limped along for a while helped by a $2 billion
dollar payout from Microsoft for supporting SCO during the 2002 Linux lawsuit,
but that only lasted them a few years (SO much politics happend behind the
scenes) and when they collapsed Oracle bought the corpse for their patent
portfolio (James Gosling talked about that at some length):

  https://landley.net/notes-2010.html#24-10-2010

Because dying business models explode into a cloud of IP litigation and Oracle's
core business has started doing an upward retreat in the face of bog standard
disruptive technology attack in _2001_:

  https://landley.net/writing/database.html

(I know I did a writeup on the Sun Civil War years ago, but haven't managed to
find it in years. I should write it all up again, but it's kinda long and needs
SO many references to properly sequence and anchor it all. The biggest single
reference is http://www.blinkenlights.com/classiccmp/javaorigin.html but that's
maybe 1/3 of the story if you read between the lines...)

>> Yes and no. There's bsd and macos support now, and post-1.0 I might put some of
>> my own effort into expanding the BSD side of it. (MacOS is a cross between
>> "BSD's downstream" and "a weird proprietary mega-corporation that can sign a
>> check if it wants me to care", but Elliott has users who build AOSP on MacOS and
>> borrows a laptop to do fixups there every six months or so, and sends me patches.)
> 
> I do all my development on Mac OS X and do testing and some debugging on
> Linux. But I don't pretend that Apple is ever going to update the bash
> version they ship, and I know they're going to try and deprecate it due to
> licensing issues, so I don't spend any time putting in anything Mac OS-
> specific. Most of the proposals -- even the bad ones -- come from the Linux
> side.

Stallman's IDIOTIC handling of GPLv3 made companies through the GPLv2 baby out
with the GPLv3 bathwater, leading to:

  http://meta.ath0.com/2012/02/05/apples-great-gpl-purge/

I did a section on that in my 2013 talk:

And did a license-specific talk later that year (outline, links to mp3):

  https://landley.net/talks/ohio-2013.txt

But alas, I only managed to cover about 1/2 my prepared material in the time
allotted, and they were recording _video_ but only ever posted audio so the web
pages I was showing people with primary sources didn't get recorded. (I linked
some in the outline.)

On the bright side, Jeremy Allison (of Samba) has FINALLY come around to my
position on copyleft and licensing, and recently gave his own talk on it:

  https://archive.org/details/copyleftconf2020-allison

I met him in 2008 when we were on a panel together with Eben Moglen to talk
about licensing back when I was still a plaintiff in all those busybox lawsuits,
which was back when I thought The GPL could still be saved...

  https://landley.net/notes-2008.html#19-09-2008

Since then, I did 0BSD and got SPDX and OSI and Github to accept it. I've
proposed a talk on THAT at a couple conferences because there's SO MUCH MATERIAL:

  https://landley.net/notes-2017.html#26-03-2017
  https://landley.net/notes-2017.html#27-03-2017
  https://landley.net/notes-2018.html#13-03-2018

But alas, none of the conferences wanted to hear about it. Oh well.

>>>> It's a pity posix is moribund.
>>>
>>> It's not dead, just slow.
>>
>> Give him time.
> 
> I think you give Jorg a lot more credit for influence than he deserves.

He successfully drove _me_ away.

>>> https://www.austingroupbugs.net/view.php?id=789
>>>
>>> So we started talking about this in some official proposed way in 2013,
>>> continued sporadically until 2018, decided on some official text to add
>>> to the standard in September, 2018, and it will be in the next major
>>> revision of Posix, issue 8.
>>
>> There's going to be an issue 8? 
> 
> Yes, Geoff is preparing the document now.

Cool.

>> Posix has been replacing issue 7 in place doing
>> the "continuous integration" thing (grab random snapshot du jour from the
>> website and call it good, no two people ever experience quite the same version)
>> for 12 years now.
> 
> That's mostly up to the open group, not the people working on the standard
> itself.

That's the only place it's freely published. There were stupid "pay me for
printouts" versions (which is why Linux was NOT officially posix compliant and
Linus instead wrote it to the Solaris system call manuals his university had in
the library).

  https://landley.net/history/mirror/linux/linux-history.html

I actually went through and culled interesting posts from the first year and
change of the "linux-activists" mailing list archive:

  https://landley.net/history/mirror/linux/1991.html
  https://landley.net/history/mirror/linux/1992.html

That's the one they set up when they got kicked off of comp.os.minix when
Tanenbaum came back for the new semester and started objecting to the traffic:

  https://www.oreilly.com/openbook/opensources/book/appa.html

Fascinating stuff, if you have a computer history hobby like I do. :)

>> (I ranted a lot more here last time in the email that got lost. Probably a good
>> thing.)
> 
> About release schedules? How much more is there to say about them? I mean,
> release engineering is complex, but it doesn't seem like there are that
> many new topics to consider.

About how "continuous integration" is terrible and _having_ releases is a good
thing. Which you wouldn't think is a controversial position, AND YET...

>> I only started using bash in 1998. :)
> 
> And it was 10 years old at that time. Man, we've come a long way.

I started toybox in 2006. I'm old...

>>   If your current locale setting has an appropriate gettext database installed,
>>   $"strings" get looked up and replaced with translated versions, otherwise
>>   they act like normal double quoted strings. Also, "bash -D SCRIPT" will show
>>   you all the $"" strings in a SCRIPT so translators can make a gettext database
>>   for a new $LANG.
> 
> Pretty much.

My point is that multiple readings of the bash man page did not impart that
information to me. This may be a failing on my part, but the fact the feature is
"little used" implies it may not JUST be me.

Documentation written by people who already know the material often subtly
assumes the reader _also_ already knows it. (I tend to write documentation _as_
I learn stuff, on the theory I'm going to forget and get confused about it.)

> Chet

Rob