[Toybox] CONFIG_TOYBOX_ZHELP

Tue Jan 30 14:02:03 PST 2024

On Sat, Jan 20, 2024 at 2:52 PM Rob Landley <rob at landley.net> wrote:

> On 1/19/24 19:49, enh wrote:
> > On Fri, Jan 19, 2024 at 10:13 AM Rob Landley <rob at landley.net> wrote:
> >> Creating the binaries isn't a big deal, it's just two .config files. I
> couldn't
> >> speak to the selinux labels and whatever $PATH changes pull in the
> second
> >> directory of symlinks on the android side.
> >>
> >> I'm assuming the problem here is Android's policy of snapshotting the
> >> "generated" directory instead of allowing a shell script to call sed to
> >> regenerate the files. What is the actual policy/objection?
> >
> > it's that i don't want to duplicate your build system in our build
> > system. i _can_ write a "genrule" that calls things (including c
> > programs we've compiled), and do so in cases like
> >
> https://cs.android.com/android/platform/superproject/main/+/main:external/one-true-awk/Android.bp
> > say, but the toybox stuff is orders of magnitude more complicated than
> > that.
>
> Would it help if I pulled out "mkconfig.sh", "mkflags.sh", "mkglobals.sh",
> "mkhelp.sh", and "mknewtoys.sh" from make.sh and had the top level script
> call
> those?
>

tbh, it the fact that stuff keeps moving around that makes it easier for me
to just check in generated files. if/when it gets to the point where you
haven't touched this stuff in a couple of years, _that's_ when it might
make sense to move over :-)

> I've been considering having them build in parallel anyway. The header
> generation didn't USED to be a build bottleneck, but it's grown over the
> years
> and SMP levels have increased, so...
>

(yeah, given that the majority of toybox builds i'm waiting for are just
regular host builds in /tmp, that's really noticeable.)

> Four of the 6 headers are honestly just echo+sed invocations. And given
> even the
> config.h generation is using $SED (I.E. gsed) instead of "sed", I should
> just:
>
> sed 's/^# CONFIG_\(.*\) is not set.*/#define CFG_\1 0\n#define
> USE_\1(...)/;T;s/CONFIG_\(.*\)=y.*/#define CFG_\1 1\n#define USE_\1(...)
> __VA_ARGS__\n/;T;d' .config
>
> Sigh, all right:
>
> sed -e 's/^# CONFIG_\(.*\) is not set.*/#define CFG_\1 0\n#define
> USE_\1(...)/;T' \
>   -e 's/CONFIG_\(.*\)=y.*/#define CFG_\1 1\n#define USE_\1(...)
> __VA_ARGS__\n/;T' \
>   -e d .config
>
> Wordwrap wasn't kind to that in email, but I assume it's more generally
> legible
> to those who don't haven't implemented their own sed twice...
>

(despite having looked it up last time i tried to understand this stuff, i
still don't remember what T means. that's probably something mostly known
to people who've implemented their own sed twice. i mean, BSD/macOS seds
don't even know what it means :-) )

> Weaning help.h off of C is something I've been working towards, because the
> design idea behind the current version was sub-options need to be stitched
> together, and now I'm going "maybe some sort of ${SUBOPT}" escape syntax
> to let
> one command know when/where to block copy in another command's help text?
> (Because the -Z aren't going away. They're not SELECTABLE, but they're
> THERE.)
>

yeah, i'd wondered about that exact same idea. seems like it would help
with the md5sum-type duplication too, if you could just "#include" another
command's help in all the same-interface-different-name commands' help.

(fwiw, unless you're really anal about every last help byte -- which i
don't think you are, plus you have compression now -- i personally quite
like the coreutils option of just having -Z all the time, but on some
systems it just prints an error message. similar to the old
https://en.wikipedia.org/wiki/Bruce_Tognazzini advice for GUIs about not
hiding or even disabling invalid options --- have everything "doable" all
the time, and explain to the user why it's not currently valid if they use
it when [in most GUIs] it would have been greyed out.)

> The design questions of what the escapes should look like

heh, the reason i don't think i'd mentioned this idea to you was that i
thought it would be less likely to end up a bikeshed ... i'm happy to
pretend to have a strong opinion if it gets you out of the
https://en.wikipedia.org/wiki/Buridan%27s_ass problem :-)

> and whether to do it
> at build time or runtime remain unaddressed, but "don't require C at build
> time
> for this" is one of the design goals of that redo. (The gzip stuff
> shouldn't
> impact that because I can have a static "decompressed text" variable that
> gets
> populated so runtime can have a help4("command") that returns a null
> terminated
> char * without multiple decompressing; the theory is 1) you always exit
> right
> after showing help text, 2) you show help text INSTEAD of performing the
> normal
> operation of the command, so the memory bloat isn't a big deal even on
> embedded.
> (It's a high water mark, but a fixed size one.)
>
> Redoing mkflags with shell script is unlikely to happen soon, in part
> because
> that one DOES vary by .config. Although...
>
> #define FLAG_x ((FORCED_FLAG|CFG_COMMAND)<<1)
>
> Doing it that way would NOT vary (that part of) the header at compile
> time, and
> the compiler would still resolve it at compile time instead of runtime.
> And it's
> reasonably easy to generate because the COMMAND part is already used in the
> local block. (Modulo command sub-options, I still want dead code
> elimination to
> notice if FLAG(x) is constant zero for a disabled sub-option. So the CFG
> symbol
> should be for the sub-option, not the command, which means it needs to
> listen to
> the USE macros in that part of the string. Which if I'm doing it in shell
> means
> I need a sed that chops up each option string into individual lines of
> "string"
> and USE_BLAH("string") and then iterates through them back to front (pipe
> it
> through tac) and then... Hmmm.)
>
> Making the OPTSTR invariant is also tricksy, but I can pull out a very OLD
> trick
> which is that when I first added USE() macros to busybox
> (https://git.busybox.net/busybox/commit/?id=7bfa88f315d7) I also had
> SKIP()
> macros at the same time (well actually
> https://git.busybox.net/busybox/commit/?id=0d8766a3b13e and Denys
> objected to my
> names ala https://git.busybox.net/busybox/commit/?id=5e34ff29bcc8 but it
> DOES
> skip the contents of the macro when the config symbol is enabled...) So I
> can do
> a processed version of the symbols and an unprocessed version of the
> symbols,
> and then "xyz"USE_BLAH("abcde")SKIP_BLAH("^A^A^A^A^A")"fgh" and again it
> should
> resolve statically at compile time. And the processing on the sed output
> chunks
> would again be within the realm of what sed can do...
>
> Ahem, AFTER a release.
>
> >> My _theory_ is you don't want to compile external C code and run it on
> your
> >> build server for security reasons.
> >
> > no, there are rules against checking in "blobs" of any kind, but the
> > assumption is that imported source has been reviewed anyway. (that's
> > why two googlers have to +2 a change written and uploaded by an
> > outsider now --- you'd have to corrupt me _and_ someone else to get
> > your dodgy change in. it's also why only googlers can kick off a
> > presubmit build.)
>
> Ok. Makes sense.
>
> >> Which is understandable if so, but does that
> >> mean you patched all the "HOSTCC" calls out of the linux kernel build?
> >
> > unlike Android proper, which is no longer investigating bazel, the
> > kernel build fully switched to bazel, and doesn't use the upstream
> > build at all. (but there's a whole team working on the kernel, whereas
>
> Once upon a time, "cc *.c -o thingy' worked fine. Maybe a bit of setup,
> but you
> could end with "and now compile it all".
>
> And then "make" was invented as a premature optimization because "build
> all" was
> too slow, and they hadn't given Moore's Law time to work yet. Plus the
> whole
> https://en.wikipedia.org/wiki/Software_crisis thing hadn't yet switched
> over to
> scripting languages that just run the source code without needing to
> compile
> anything, again because REPEATED EXPONENTIAL DOUBLING wasn't happening fast
> enough. (Sigh. Unix's small tools connected via pipes ALSO ADDRESSED THIS.
> It
> avoided compiling large monolithic software. The shell is where we got the
> name
> "scripting language" from.)
>
> Alas, when SMP was invented, the compiler did NOT get extended to
> automatically
> fork off sub-processes for each .c so "cc *.c -o thingy" would naturally
> take
> advantage of SMP. Instead they taught MAKE to do it, which was just wrong.
>

(i'm not sure what part of "do the easy thing" and "unix" you don't think
go together. and to be fair, i've seen a lot of compilers for several
different languages try to move parallelism into the compiler with
relatively little success. it's harder than it sounds, especially if you're
expecting speedups anywhere close to what you get from the external
parallelism.)

> Along the way C++ happened, with templates that are literally Turing
> complete at
> compile time so compilation is technically never guaranteed to finish (yes
> they
> added the 17 level recursion limit to try to squelch that but last I
> checked you
> could still fill up the drive with your .o file and run a build that if
> that
> limit wasn't hit would outlast the sun, in a couple hundred bytes of C++
> source.
> Ask Gerrit Kajmowicz about that, he knows where ALL the bodies are buried,
> yet
> somehow still thought C++ and CORBA were good ideas last I talked to him.)
>
> My BIGGEST disappointment with llvm is that they didn't teach "cc *.c" to
> use
> multiple processors. I planned to do it in qcc but I gave up trying to do a
> tinycc fork _and_ toybox _and_ mkroot all at once. (I'm no longer a
> teenager
> staying up all night in a bedroom, and while my goals still get described
> as
> "boiling the ocean" I can only do one ocean at a time if I want visible
> progress...)
>
> Anyway, tangent.
>
> If the Android kernel team could actually document what the kernel build
> NEEDS,
> and keep such a document up to date, then "building it" theoretically
> wouldn't
> be a big deal. I built toybox with a shell script in part to demonstrate
> that
> the build isn't a big deal: the Makefile at the top is literally just a
> wrapper
> to provide a UI around scripts/make.sh and scripts/install.sh and
> scripts/test.sh (modulo the pile of kconfig I need to replace with some
> kind of
> scripts/configure.sh).
>
> But alas I let scripts/make.sh get cluttered and have been meaning to tidy
> it up
> ever since. I didn't want to touch it too much because I didn't want to
> break
> whatever subset if it you're using, but if you're NOT using it then I
> don't have
> to worry about that. :)
>
> Speaking of which, generated/build.sh _is_ the "cc *.c" version of the
> build I
> was talking about above, preserved for people to reproduce on target
> systems
> that haven't got the right $PATH tools to do the full build.
>
> Looking at it again, I should probably peel out $FILES into a variable as
> well,
> so the last line is just:
>
>   $BUILD main.c lib/*.c $FILES $LINK -o toybox
>
> In theory it should be "cc $CFLAGS main.c lib/*.c toys/*/*.c -o toybox" but
> cross compiling changes the cc name, and I can't use toys/*/*.c because
> some
> commands don't compile on some targets so we can't just "build it all, let
> the
> linker's garbage collection sort it out". Not without adding #ifdefs to the
> command implementations. And thus selecting the list of files based on the
> .config.
>
> (Including the sed/grep that determines what files to include in the
> "simple"
> build.sh defeats the purpose of the simple one not depending on the host
> having
> usable tools yet. The use case is bootstrapping toybox natively on a
> system that
> scraped up a toolchain but hasn't got a capable $PATH...)
>
> > toybox is just me on friday afternoons, but only when i don't have
> > anything more urgent.)
>
> I try not to make more work for you than necessary. :)
>
> >> I note that config.h is _always_ rebuilt from .config by scripts/make.sh
> >> (presumably overwriting your snapshot version)
> >
> > we don't run scripts/make.sh --- we build everything directly via soong.
>
> Which means I can change it up without worrying about breaking you.
>
> AFTER cutting a release...
>
> >> What would the two pools be, anyway? It seems reminiscent of the /bin
> vs /sbin
> >> split.
> >
> > yeah, that's been the point where i've always been unconvinced. (the
> > canonical example is people whining that having /bin/netcat makes life
> > easier for bad guys, which i've never really believed. saves the
> > _white hat_ folks a tiny amount of effort when writing their PoCs,
> > maybe, but the black hats? (a) i'd like to see the specific example
> > where one did that, please and (b) i'd like some evidence that
> > _that's_ an expensive part of a modern exploit chain, rather than
> > something the malware factories just have on the shelf anyway.)
>
> I'd vaguely assumed each app had a whitelist of IPs it's allowed to dial
> out to
> as part of its install metadata, but then I remembered you use IPv6 for
> everything.
>
> (Wikipedia[citation needed] blocked the whole ipv6 domain from doing
> anonymous
> edits because trying to whack-a-mole it was basically impossible due to the
> terrible design of ipv6. I actually went to some ipv6 design meetings back
> in
> the 1990s when they were taking RFCs, but my takeaway was "oh goddess this
> isn't
> salvageable, stay with IPv4 as long as humanly possible". I still tether my
> phone using "dhclient -d -4 usb0".)
>
> Sticking with IPv4 while still covering the whole planet's population is
> ENTIRELY DOABLE, by the way:
>
>
> https://www.mail-archive.com/cerowrt-devel@lists.bufferbloat.net/msg05838.html
>
> But unfortunately the ipv6 people treat any attempt to make ipv4 better as
> an
> existential threat. Nobody voluntarily uses IPv6 while IPv4 remains an
> option.
>
> (4 billion addresses means if the average household size were 2.1 people
> every
> household could have its own static IP. and we're already masquerading
> behind
> household routers. Sticking with IPv4 is not technically impractical, just
> politically so. Back when I got my first web-enabled phone it _was_ using a
> masqueraded IPv4 address behind a virtual router, and there's still not a
> lot of
> phone servers. But alas, that ship has sailed...)
>
> > personally, i'd be happier with the "apps get _no_ /bin, shell gets
> > everything" option, but that probably requires a time machine given
> > the app compat issues.
>
> The amount of C code I've seen with system("rm filename"); and friends...
>
> That said, posix environments and shell scripting aren't a bad thing. :)
>
> >> > i assume. i don't actually have any idea, or any good way of knowing,
> >> > what apps are calling what toys.
> >>
> >> I've done this already for system bootstrapping, mkroot/record-commands
> is a bit
> >> overkill for this, but the technique could presumably be scaled down to
> set a
> >> bit in a scoreboard or something. (I needed to know the command line so
> I could
> >> reproduce/debug behavior divergences, if you just want to know which
> files got
> >> execed...)
> >
> > oh, it's perfectly doable. but -- as you'd imagine and hope -- there's
> > a _lot_ of paperwork and legal signoff for anything like that, and i
> > don't think anyone's interested enough in the results to do that work.
>
> And I still want a posix container to do builds in, so I'm personally
> looking in
> the other direction. See also https://mstdn.jp/@landley/111763534546802525
>
> >> Or if I get the strong/weak symbol changes in, a wrapper around
> toy_singleinit()
> >> or similar could live in lib/portability.c and do extra setup
> before/after
> >> calling the original. Although the more logical thing to do THERE might
> be to
> >> have bionic's dynamic linker do it so you could log ALL executable
> launches.
> >> (Fire off a thread to record it and it shouldn't add measurable latency
> on an
> >> SMP system, plus exec isn't _that_ common and already fairly expensive
> as
> >> operations go. You zygote everything already to avoid it coming up
> much...)
> >>
> >> > if i had my time again, i'd be
> >> > tempted to make everything in /bin only accessible to the shell,
> >> > because tbh most of what i've seen apps do is very stupid! although
> >> > there's selection bias there: "why would i even be looking at what an
> >> > app's doing if it isn't doing something wrong/stupid?".)
> >>
> >> A more posix-like programming environment doesn't strike me as a bad
> thing, but
> >> I'm biased. :)
> >
> > me too. my only interest in "apps get nothing" would be minimizing the
> > app compat issues of _toybox_ changes. though "luckily" most of the
> > uses i've seen are stupid enough to be unlikely to be affected by any
> > plausible toybox behavioral/syntax changes.
>
> I'm trying to avoid breaking API changes, and also treating "delta between
> what
> debian does and what toybox does" as "thing to scrutinize and at least
> document". (And posix/toybox. And to a lesser extent busybox/toybox which
> would
> still surprise Alpine Linux users...)
>
> >> Debian not having /sbin in non-root users' $PATH is something I find
> personally
> >> annoying, but also a reasonably strong precedent for saying "these
> commands
> >> normal users are not expected to touch".
> >>
> >> >> Which admittedly has a giant "apple(tm) version skew" warning in the
> middle but
> >> >> I honestly have no idea how to fix that: mknodat() is a posix-2008
> function:
> >> >>
> >> >>
> https://pubs.opengroup.org/onlinepubs/9699919799.2008edition/functions/mknodat.html
> >> >>
> >> >> Which Apple is now claiming it only added October of 2022?
> >> >>
> >> >> https://en.wikipedia.org/wiki/MacOS_Ventura
> >> >>
> >> >> I mean... Really? They didn't catch up to posix-2008 for FIFTEEN
> YEARS? Steve
> >> >> Jobs was still alive for almost four years after that came out...
> >> >
> >> > if that surprises you ... "obviously you're not a golfer".
> >> >
> >> > don't get me started on how long we had to wait for clock_gettime().
> >> > that alone has to be responsible for half the macos #ifdefery on the
> >> > entire internet!
> >>
> >> The seven year time horizon does not apply to mac, because I haven't
> got the
> >> domain expertise.
> >
> > well, the nice thing about mac users is that 90% of them will be
> > running the shiniest thing within 6 months. (and not just OS versions:
> > i've been amazed how quickly they upgraded to arm64 machines too.) the
> > trouble is how long it takes before Apple adds a thing.
>
> So their installed base is fickle and not tied to their existing
> infrastructure
> investment, is what you're saying. (Bitch about the people still running
> XP/kitkat all you want, but it means they're unlikely to switch to a
> competitor
> any time soon.)
>
> >> I wound up doing the gzip compression instead, because repeated text is
> the
> >> definition of compressible,
>
> The phrase for this is "avoidance productivity", by the way. Bog standard
> ADHD
> behavior...
>
> >> but I still have the issue that shared
> >> implementations with the same help text (ala md5sum/sha1sum or
> chgrp/chown) have
> >> the same usage: line despite having different command names.
> >
> > (heh, yeah, you beat me to it :-) )
>
> The hitch is generating it for --help without generating it for kconfig
> help.
>
> That said... if I took the command name out of the usage: line in the help
> text,
> I could still have the ls menuconfig help start with:
>
> usage: [-1ACFHLNRSUXZabcdfghilmnopqrstuwx] [--color[=auto]] [FILE...]
>
> And then:
>
> A) have the build break generating help.h if there's no usage: line for a
> config
> symbol that has a command_main() function.
>
> B) strip "usage:[ ]*" off the start when saving help string in the header.
>
> C) programmatically insert "usage: command " at the start of help text.
>
> This would make the kconfig help text slightly awkward, but not unreadable?
>
> Rob
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.landley.net/pipermail/toybox-landley.net/attachments/20240130/9b4c5855/attachment-0001.htm>