[Toybox] CONFIG_TOYBOX_ZHELP

Sat Jan 20 14:59:28 PST 2024

On 1/19/24 19:49, enh wrote:
> On Fri, Jan 19, 2024 at 10:13 AM Rob Landley <rob at landley.net> wrote:
>> Creating the binaries isn't a big deal, it's just two .config files. I couldn't
>> speak to the selinux labels and whatever $PATH changes pull in the second
>> directory of symlinks on the android side.
>>
>> I'm assuming the problem here is Android's policy of snapshotting the
>> "generated" directory instead of allowing a shell script to call sed to
>> regenerate the files. What is the actual policy/objection?
> 
> it's that i don't want to duplicate your build system in our build
> system. i _can_ write a "genrule" that calls things (including c
> programs we've compiled), and do so in cases like
> https://cs.android.com/android/platform/superproject/main/+/main:external/one-true-awk/Android.bp
> say, but the toybox stuff is orders of magnitude more complicated than
> that.

Would it help if I pulled out "mkconfig.sh", "mkflags.sh", "mkglobals.sh",
"mkhelp.sh", and "mknewtoys.sh" from make.sh and had the top level script call
those?

I've been considering having them build in parallel anyway. The header
generation didn't USED to be a build bottleneck, but it's grown over the years
and SMP levels have increased, so...

Four of the 6 headers are honestly just echo+sed invocations. And given even the
config.h generation is using $SED (I.E. gsed) instead of "sed", I should just:

sed 's/^# CONFIG_\(.*\) is not set.*/#define CFG_\1 0\n#define
USE_\1(...)/;T;s/CONFIG_\(.*\)=y.*/#define CFG_\1 1\n#define USE_\1(...)
__VA_ARGS__\n/;T;d' .config

Sigh, all right:

sed -e 's/^# CONFIG_\(.*\) is not set.*/#define CFG_\1 0\n#define USE_\1(...)/;T' \
  -e 's/CONFIG_\(.*\)=y.*/#define CFG_\1 1\n#define USE_\1(...) __VA_ARGS__\n/;T' \
  -e d .config

Wordwrap wasn't kind to that in email, but I assume it's more generally legible
to those who don't haven't implemented their own sed twice...

Weaning help.h off of C is something I've been working towards, because the
design idea behind the current version was sub-options need to be stitched
together, and now I'm going "maybe some sort of ${SUBOPT}" escape syntax to let
one command know when/where to block copy in another command's help text?
(Because the -Z aren't going away. They're not SELECTABLE, but they're THERE.)

The design questions of what the escapes should look like and whether to do it
at build time or runtime remain unaddressed, but "don't require C at build time
for this" is one of the design goals of that redo. (The gzip stuff shouldn't
impact that because I can have a static "decompressed text" variable that gets
populated so runtime can have a help4("command") that returns a null terminated
char * without multiple decompressing; the theory is 1) you always exit right
after showing help text, 2) you show help text INSTEAD of performing the normal
operation of the command, so the memory bloat isn't a big deal even on embedded.
(It's a high water mark, but a fixed size one.)

Redoing mkflags with shell script is unlikely to happen soon, in part because
that one DOES vary by .config. Although...

#define FLAG_x ((FORCED_FLAG|CFG_COMMAND)<<1)

Doing it that way would NOT vary (that part of) the header at compile time, and
the compiler would still resolve it at compile time instead of runtime. And it's
reasonably easy to generate because the COMMAND part is already used in the
local block. (Modulo command sub-options, I still want dead code elimination to
notice if FLAG(x) is constant zero for a disabled sub-option. So the CFG symbol
should be for the sub-option, not the command, which means it needs to listen to
the USE macros in that part of the string. Which if I'm doing it in shell means
I need a sed that chops up each option string into individual lines of "string"
and USE_BLAH("string") and then iterates through them back to front (pipe it
through tac) and then... Hmmm.)

Making the OPTSTR invariant is also tricksy, but I can pull out a very OLD trick
which is that when I first added USE() macros to busybox
(https://git.busybox.net/busybox/commit/?id=7bfa88f315d7) I also had SKIP()
macros at the same time (well actually
https://git.busybox.net/busybox/commit/?id=0d8766a3b13e and Denys objected to my
names ala https://git.busybox.net/busybox/commit/?id=5e34ff29bcc8 but it DOES
skip the contents of the macro when the config symbol is enabled...) So I can do
a processed version of the symbols and an unprocessed version of the symbols,
and then "xyz"USE_BLAH("abcde")SKIP_BLAH("^A^A^A^A^A")"fgh" and again it should
resolve statically at compile time. And the processing on the sed output chunks
would again be within the realm of what sed can do...

Ahem, AFTER a release.

>> My _theory_ is you don't want to compile external C code and run it on your
>> build server for security reasons.
> 
> no, there are rules against checking in "blobs" of any kind, but the
> assumption is that imported source has been reviewed anyway. (that's
> why two googlers have to +2 a change written and uploaded by an
> outsider now --- you'd have to corrupt me _and_ someone else to get
> your dodgy change in. it's also why only googlers can kick off a
> presubmit build.)

Ok. Makes sense.

>> Which is understandable if so, but does that
>> mean you patched all the "HOSTCC" calls out of the linux kernel build?
> 
> unlike Android proper, which is no longer investigating bazel, the
> kernel build fully switched to bazel, and doesn't use the upstream
> build at all. (but there's a whole team working on the kernel, whereas

Once upon a time, "cc *.c -o thingy' worked fine. Maybe a bit of setup, but you
could end with "and now compile it all".

And then "make" was invented as a premature optimization because "build all" was
too slow, and they hadn't given Moore's Law time to work yet. Plus the whole
https://en.wikipedia.org/wiki/Software_crisis thing hadn't yet switched over to
scripting languages that just run the source code without needing to compile
anything, again because REPEATED EXPONENTIAL DOUBLING wasn't happening fast
enough. (Sigh. Unix's small tools connected via pipes ALSO ADDRESSED THIS. It
avoided compiling large monolithic software. The shell is where we got the name
"scripting language" from.)

Alas, when SMP was invented, the compiler did NOT get extended to automatically
fork off sub-processes for each .c so "cc *.c -o thingy" would naturally take
advantage of SMP. Instead they taught MAKE to do it, which was just wrong.

Along the way C++ happened, with templates that are literally Turing complete at
compile time so compilation is technically never guaranteed to finish (yes they
added the 17 level recursion limit to try to squelch that but last I checked you
could still fill up the drive with your .o file and run a build that if that
limit wasn't hit would outlast the sun, in a couple hundred bytes of C++ source.
Ask Gerrit Kajmowicz about that, he knows where ALL the bodies are buried, yet
somehow still thought C++ and CORBA were good ideas last I talked to him.)

My BIGGEST disappointment with llvm is that they didn't teach "cc *.c" to use
multiple processors. I planned to do it in qcc but I gave up trying to do a
tinycc fork _and_ toybox _and_ mkroot all at once. (I'm no longer a teenager
staying up all night in a bedroom, and while my goals still get described as
"boiling the ocean" I can only do one ocean at a time if I want visible progress...)

Anyway, tangent.

If the Android kernel team could actually document what the kernel build NEEDS,
and keep such a document up to date, then "building it" theoretically wouldn't
be a big deal. I built toybox with a shell script in part to demonstrate that
the build isn't a big deal: the Makefile at the top is literally just a wrapper
to provide a UI around scripts/make.sh and scripts/install.sh and
scripts/test.sh (modulo the pile of kconfig I need to replace with some kind of
scripts/configure.sh).

But alas I let scripts/make.sh get cluttered and have been meaning to tidy it up
ever since. I didn't want to touch it too much because I didn't want to break
whatever subset if it you're using, but if you're NOT using it then I don't have
to worry about that. :)

Speaking of which, generated/build.sh _is_ the "cc *.c" version of the build I
was talking about above, preserved for people to reproduce on target systems
that haven't got the right $PATH tools to do the full build.

Looking at it again, I should probably peel out $FILES into a variable as well,
so the last line is just:

  $BUILD main.c lib/*.c $FILES $LINK -o toybox

In theory it should be "cc $CFLAGS main.c lib/*.c toys/*/*.c -o toybox" but
cross compiling changes the cc name, and I can't use toys/*/*.c because some
commands don't compile on some targets so we can't just "build it all, let the
linker's garbage collection sort it out". Not without adding #ifdefs to the
command implementations. And thus selecting the list of files based on the .config.

(Including the sed/grep that determines what files to include in the "simple"
build.sh defeats the purpose of the simple one not depending on the host having
usable tools yet. The use case is bootstrapping toybox natively on a system that
scraped up a toolchain but hasn't got a capable $PATH...)

> toybox is just me on friday afternoons, but only when i don't have
> anything more urgent.)

I try not to make more work for you than necessary. :)

>> I note that config.h is _always_ rebuilt from .config by scripts/make.sh
>> (presumably overwriting your snapshot version)
> 
> we don't run scripts/make.sh --- we build everything directly via soong.

Which means I can change it up without worrying about breaking you.

AFTER cutting a release...

>> What would the two pools be, anyway? It seems reminiscent of the /bin vs /sbin
>> split.
> 
> yeah, that's been the point where i've always been unconvinced. (the
> canonical example is people whining that having /bin/netcat makes life
> easier for bad guys, which i've never really believed. saves the
> _white hat_ folks a tiny amount of effort when writing their PoCs,
> maybe, but the black hats? (a) i'd like to see the specific example
> where one did that, please and (b) i'd like some evidence that
> _that's_ an expensive part of a modern exploit chain, rather than
> something the malware factories just have on the shelf anyway.)

I'd vaguely assumed each app had a whitelist of IPs it's allowed to dial out to
as part of its install metadata, but then I remembered you use IPv6 for everything.

(Wikipedia[citation needed] blocked the whole ipv6 domain from doing anonymous
edits because trying to whack-a-mole it was basically impossible due to the
terrible design of ipv6. I actually went to some ipv6 design meetings back in
the 1990s when they were taking RFCs, but my takeaway was "oh goddess this isn't
salvageable, stay with IPv4 as long as humanly possible". I still tether my
phone using "dhclient -d -4 usb0".)

Sticking with IPv4 while still covering the whole planet's population is
ENTIRELY DOABLE, by the way:

https://www.mail-archive.com/cerowrt-devel@lists.bufferbloat.net/msg05838.html

But unfortunately the ipv6 people treat any attempt to make ipv4 better as an
existential threat. Nobody voluntarily uses IPv6 while IPv4 remains an option.

(4 billion addresses means if the average household size were 2.1 people every
household could have its own static IP. and we're already masquerading behind
household routers. Sticking with IPv4 is not technically impractical, just
politically so. Back when I got my first web-enabled phone it _was_ using a
masqueraded IPv4 address behind a virtual router, and there's still not a lot of
phone servers. But alas, that ship has sailed...)

> personally, i'd be happier with the "apps get _no_ /bin, shell gets
> everything" option, but that probably requires a time machine given
> the app compat issues.

The amount of C code I've seen with system("rm filename"); and friends...

That said, posix environments and shell scripting aren't a bad thing. :)

>> > i assume. i don't actually have any idea, or any good way of knowing,
>> > what apps are calling what toys.
>>
>> I've done this already for system bootstrapping, mkroot/record-commands is a bit
>> overkill for this, but the technique could presumably be scaled down to set a
>> bit in a scoreboard or something. (I needed to know the command line so I could
>> reproduce/debug behavior divergences, if you just want to know which files got
>> execed...)
> 
> oh, it's perfectly doable. but -- as you'd imagine and hope -- there's
> a _lot_ of paperwork and legal signoff for anything like that, and i
> don't think anyone's interested enough in the results to do that work.

And I still want a posix container to do builds in, so I'm personally looking in
the other direction. See also https://mstdn.jp/@landley/111763534546802525

>> Or if I get the strong/weak symbol changes in, a wrapper around toy_singleinit()
>> or similar could live in lib/portability.c and do extra setup before/after
>> calling the original. Although the more logical thing to do THERE might be to
>> have bionic's dynamic linker do it so you could log ALL executable launches.
>> (Fire off a thread to record it and it shouldn't add measurable latency on an
>> SMP system, plus exec isn't _that_ common and already fairly expensive as
>> operations go. You zygote everything already to avoid it coming up much...)
>>
>> > if i had my time again, i'd be
>> > tempted to make everything in /bin only accessible to the shell,
>> > because tbh most of what i've seen apps do is very stupid! although
>> > there's selection bias there: "why would i even be looking at what an
>> > app's doing if it isn't doing something wrong/stupid?".)
>>
>> A more posix-like programming environment doesn't strike me as a bad thing, but
>> I'm biased. :)
> 
> me too. my only interest in "apps get nothing" would be minimizing the
> app compat issues of _toybox_ changes. though "luckily" most of the
> uses i've seen are stupid enough to be unlikely to be affected by any
> plausible toybox behavioral/syntax changes.

I'm trying to avoid breaking API changes, and also treating "delta between what
debian does and what toybox does" as "thing to scrutinize and at least
document". (And posix/toybox. And to a lesser extent busybox/toybox which would
still surprise Alpine Linux users...)

>> Debian not having /sbin in non-root users' $PATH is something I find personally
>> annoying, but also a reasonably strong precedent for saying "these commands
>> normal users are not expected to touch".
>>
>> >> Which admittedly has a giant "apple(tm) version skew" warning in the middle but
>> >> I honestly have no idea how to fix that: mknodat() is a posix-2008 function:
>> >>
>> >> https://pubs.opengroup.org/onlinepubs/9699919799.2008edition/functions/mknodat.html
>> >>
>> >> Which Apple is now claiming it only added October of 2022?
>> >>
>> >> https://en.wikipedia.org/wiki/MacOS_Ventura
>> >>
>> >> I mean... Really? They didn't catch up to posix-2008 for FIFTEEN YEARS? Steve
>> >> Jobs was still alive for almost four years after that came out...
>> >
>> > if that surprises you ... "obviously you're not a golfer".
>> >
>> > don't get me started on how long we had to wait for clock_gettime().
>> > that alone has to be responsible for half the macos #ifdefery on the
>> > entire internet!
>>
>> The seven year time horizon does not apply to mac, because I haven't got the
>> domain expertise.
> 
> well, the nice thing about mac users is that 90% of them will be
> running the shiniest thing within 6 months. (and not just OS versions:
> i've been amazed how quickly they upgraded to arm64 machines too.) the
> trouble is how long it takes before Apple adds a thing.

So their installed base is fickle and not tied to their existing infrastructure
investment, is what you're saying. (Bitch about the people still running
XP/kitkat all you want, but it means they're unlikely to switch to a competitor
any time soon.)

>> I wound up doing the gzip compression instead, because repeated text is the
>> definition of compressible,

The phrase for this is "avoidance productivity", by the way. Bog standard ADHD
behavior...

>> but I still have the issue that shared
>> implementations with the same help text (ala md5sum/sha1sum or chgrp/chown) have
>> the same usage: line despite having different command names.
> 
> (heh, yeah, you beat me to it :-) )

The hitch is generating it for --help without generating it for kconfig help.

That said... if I took the command name out of the usage: line in the help text,
I could still have the ls menuconfig help start with:

usage: [-1ACFHLNRSUXZabcdfghilmnopqrstuwx] [--color[=auto]] [FILE...]

And then:

A) have the build break generating help.h if there's no usage: line for a config
symbol that has a command_main() function.

B) strip "usage:[ ]*" off the start when saving help string in the header.

C) programmatically insert "usage: command " at the start of help text.

This would make the kconfig help text slightly awkward, but not unreadable?

Rob