[Toybox] [PATCH] toysh: [ isn't a bultin in single command binaries while test and [[ are. Also shut up arch by replacing egrep with grep -E

Mon Feb 12 01:02:10 PST 2024

On 2/10/24 21:22, Oliver Webb via Toybox wrote:
> Hello, I was benchmarking toysh against shells like dash and bash by calling the [ command
> 1,000,000 times, assuming it was a builtin on toysh.
> 
> dash and bash can do this in less than 10 seconds, while toysh takes 16.5 _minutes_. After 
> some fiddling with strace, I noticed it was forking off the [ command...
> which is declared with TOYFLAG_MAYFORK (to prevent this exact issue)
> 
> The problem is in scripts/single.sh, which while building the toysh searches for MAYFORK in command declarations
> to declare as dependencies... but only ones declared with the NEWTOY() macro,
> which [ (aka USE_TEST_GLUE) isn't (The only OLDTOY() with TOYFLAG_MAYFORK too).

Ah, single.sh needs OLDTOY aliases. Got it. Commit 9f4df994dd93.

> This patch fixes that, so that [ is automatically available as a buitin while building a single command
> binary of the shell.
> 
> Also, at least on arch linux, egrep has been replaced by a shell script that warns the user that
> egrep is deprecated,

Why did gnu do that? What's wrong with egrep? It's been part of the linux
command line as long as I've been using linux. Oldest VM image I have at hand is
red hat 6.2 from Y2K, and egrep was in it. Toybox provides egrep...

Are they going to start removing zcat and friends too? That's just gunzip -c. By
this logic they'd also remove gunzip because that's just gzip -d...

> then calls grep -E. To get rid of these warnings, I replaced egrep with grep -E in this patch

FYI doing multiple conceptually unrelated things in the same patch is what took
it from quick glance-and-apply when I first read the email to "let me evaluate
this when I have fewer distractions" on the todo list, which is uncomfortably
like coming into contact with the La Brea Tar Pits. (My todo list works like a
fifo stack in the short term, and past some non-obvious buoyancy threshold works
like a compost heap, which there's a scientific term for but I can't remember it
and looking up a word like "thixotropic" or "economic hysteresis" when you can't
remember it is always tricky. It's this phenomenon though
https://en.wikipedia.org/wiki/Ascending_and_descending_(diving)#Freediving which
is why they invented https://en.wikipedia.org/wiki/Buoyancy_compensator_(diving) .)

In this case, the question making it non-obvious (to me) is how should I RESPOND
to the FSF being an obvious bully over an aesthetic issue that breaks backwards
compatibility with an uncountable number of scripts. Should I bow before the
might of all powerful gnu or is there a way to give them the double middle
finger here, or can I just defer this until it's actually _removed_ not
deprecated and thus actually breaks builds rather than produces stupid warnings?
Is egrep 2>/dev/null a reasonable option in the short term?

I already have generated/build.sh and have been pondering a 'make setup' target
that would build just the commands used to build a full toybox, in a libc-only
link without doing any of the environment probes, and then sets the $PATH to
where it's installed them so it can run the full build. A bit like 'make
airlock' but more restricted, which in theory would allow building on the mac
without homebrew, and on freebsd without bash or gsed. Do I have enough time
before the FSF shoots the hostages to add that to a release, and can I make the
UI reasonably straightforward so it Just Happens rather than people needing to
know about it? Maybe the build should ALWAYS do it unless TEST_HOST is set, the
way the test suite works? Which raises the question of build speed: how long
does cc main.c lib/*.c toys/*/{sh,sed,grep...}.c take to compile, and can I
parallelize it without bash going all funky? Is that a good approach to take on
this issue, or not worth the fight?

Hence todo list.

This is also one of those "academic infighting is so vicious because the stakes
are so small" things, where a large consequence makes the results easier to see
and smaller consequences are less obvious and thus harder to way. People write
entire academic papers about this by the way,
https://core.ac.uk/download/pdf/132270836.pdf . See also
https://landley.net/notes-2010.html#13-08-2010 where the absence of empirical
tests breaks distributed development's response to aesthetic issues, because
there's no mechanism to establish consensus other than a BDFL decision maker
spending political capital to impose an outcome.

That's _another_ reason multiple unrelated issues in the same patch tends to
catch on stuff: it's not always obvious what is and isn't a can of worms.

Rob