[Toybox] CONFIG_TOYBOX_ZHELP

enh enh at google.com
Fri Jan 19 17:49:04 PST 2024


On Fri, Jan 19, 2024 at 10:13 AM Rob Landley <rob at landley.net> wrote:
>
> On 1/16/24 19:22, enh wrote:
> > On Sat, Jan 13, 2024 at 12:38 PM Rob Landley <rob at landley.net> wrote:
> >> On 1/12/24 14:25, enh via Toybox wrote:
> >> > thanks for keeping the uncompressed path!
> >>
> >> Always the plan. Besides, if the binary lives on something like squashfs,
> >> decompressing it twice just wastes CPU. And the standalone command builds
> >> wouldn't really benefit either without a shared lib/lib.so.
> >>
> >> (Sigh. You're going to want a shared toylib.so, aren't you...)
> >
> > why? we use the multicall binary with symlinks.
> >
> > (the most interesting question i have in that area is "should we have
> > _two_ binaries with different selinux labels, so we can differentiate
> > 'available to apps' and 'available to adb shell'?", but that's a bunch
> > of work i'm not sure anyone will ever have time to do,
>
> Creating the binaries isn't a big deal, it's just two .config files. I couldn't
> speak to the selinux labels and whatever $PATH changes pull in the second
> directory of symlinks on the android side.
>
> I'm assuming the problem here is Android's policy of snapshotting the
> "generated" directory instead of allowing a shell script to call sed to
> regenerate the files. What is the actual policy/objection?

it's that i don't want to duplicate your build system in our build
system. i _can_ write a "genrule" that calls things (including c
programs we've compiled), and do so in cases like
https://cs.android.com/android/platform/superproject/main/+/main:external/one-true-awk/Android.bp
say, but the toybox stuff is orders of magnitude more complicated than
that.

> My _theory_ is you don't want to compile external C code and run it on your
> build server for security reasons.

no, there are rules against checking in "blobs" of any kind, but the
assumption is that imported source has been reviewed anyway. (that's
why two googlers have to +2 a change written and uploaded by an
outsider now --- you'd have to corrupt me _and_ someone else to get
your dodgy change in. it's also why only googlers can kick off a
presubmit build.)

> Which is understandable if so, but does that
> mean you patched all the "HOSTCC" calls out of the linux kernel build?

unlike Android proper, which is no longer investigating bazel, the
kernel build fully switched to bazel, and doesn't use the upstream
build at all. (but there's a whole team working on the kernel, whereas
toybox is just me on friday afternoons, but only when i don't have
anything more urgent.)

> My android toybox checkout is a bit stale (November 10) and this airport hasn't
> got internet, but the android/linux/generated I've got lying around has:
>
>   config.h  flags.h  globals.h  help.h  newtoys.h  tags.h
>
> (Query: why does your .gitignore not have "generated" instead of 10 different
> things under generated? Also, why "change/" but "/kconfig"?)

no idea. https://android-review.googlesource.com/c/platform/external/toybox/+/2919134
brings it mostly back in line. changing how we do the .config-$OS
stuff is a bit ambitious for 17:23 on a friday evening, but that
simpler change should remind me to look at that too next week...

> Two of those headers, flags.h and help.h, are washed through C code. The rest
> (config.h, globals.h, newtoys.h, tags.h) are all created by echo and sed.
>
> I note that config.h is _always_ rebuilt from .config by scripts/make.sh
> (presumably overwriting your snapshot version)

we don't run scripts/make.sh --- we build everything directly via soong.

> because the dependency is
> commented out:
>
>   #TODO: "make $SED && make" doesn't regenerate config.h because diff .config
>   if true #isnewer config.h "$KCONFIG_CONFIG"
>
> I.E. config.h doesn't record _which_ .config file it was produced from, so
> switching between single and full builds confused it because "this file is newer
> than that file" doesn't help when "this file" is a moving target, so I just
> commented out the isnewer and put "true" in there, with a TODO to do more design
> work here.
>
> While I could add a comment to config.h to say which file it was from and teach
> the script to parse that comment... that's brittle and ugly. An elegant fix
> _removes_ complexity, and my todo item here is actually "try to parallelize
> header file creation and always do it". Which I haven't because you snapshot
> headers, and I need to find out why.
>
> > and the app
> > compat issues of trying to make that split would be a lot of trouble.
>
> This I couldn't speak to. (Presumably the issue is weaning apps that
> inappropriately use commands off of them, since they're no longer in the $PATH?)

exactly.

> What would the two pools be, anyway? It seems reminiscent of the /bin vs /sbin
> split.

yeah, that's been the point where i've always been unconvinced. (the
canonical example is people whining that having /bin/netcat makes life
easier for bad guys, which i've never really believed. saves the
_white hat_ folks a tiny amount of effort when writing their PoCs,
maybe, but the black hats? (a) i'd like to see the specific example
where one did that, please and (b) i'd like some evidence that
_that's_ an expensive part of a modern exploit chain, rather than
something the malware factories just have on the shelf anyway.)

personally, i'd be happier with the "apps get _no_ /bin, shell gets
everything" option, but that probably requires a time machine given
the app compat issues.

> > i assume. i don't actually have any idea, or any good way of knowing,
> > what apps are calling what toys.
>
> I've done this already for system bootstrapping, mkroot/record-commands is a bit
> overkill for this, but the technique could presumably be scaled down to set a
> bit in a scoreboard or something. (I needed to know the command line so I could
> reproduce/debug behavior divergences, if you just want to know which files got
> execed...)

oh, it's perfectly doable. but -- as you'd imagine and hope -- there's
a _lot_ of paperwork and legal signoff for anything like that, and i
don't think anyone's interested enough in the results to do that work.

> Or if I get the strong/weak symbol changes in, a wrapper around toy_singleinit()
> or similar could live in lib/portability.c and do extra setup before/after
> calling the original. Although the more logical thing to do THERE might be to
> have bionic's dynamic linker do it so you could log ALL executable launches.
> (Fire off a thread to record it and it shouldn't add measurable latency on an
> SMP system, plus exec isn't _that_ common and already fairly expensive as
> operations go. You zygote everything already to avoid it coming up much...)
>
> > if i had my time again, i'd be
> > tempted to make everything in /bin only accessible to the shell,
> > because tbh most of what i've seen apps do is very stupid! although
> > there's selection bias there: "why would i even be looking at what an
> > app's doing if it isn't doing something wrong/stupid?".)
>
> A more posix-like programming environment doesn't strike me as a bad thing, but
> I'm biased. :)

me too. my only interest in "apps get nothing" would be minimizing the
app compat issues of _toybox_ changes. though "luckily" most of the
uses i've seen are stupid enough to be unlikely to be affected by any
plausible toybox behavioral/syntax changes.

> Debian not having /sbin in non-root users' $PATH is something I find personally
> annoying, but also a reasonably strong precedent for saying "these commands
> normal users are not expected to touch".
>
> >> Which admittedly has a giant "apple(tm) version skew" warning in the middle but
> >> I honestly have no idea how to fix that: mknodat() is a posix-2008 function:
> >>
> >> https://pubs.opengroup.org/onlinepubs/9699919799.2008edition/functions/mknodat.html
> >>
> >> Which Apple is now claiming it only added October of 2022?
> >>
> >> https://en.wikipedia.org/wiki/MacOS_Ventura
> >>
> >> I mean... Really? They didn't catch up to posix-2008 for FIFTEEN YEARS? Steve
> >> Jobs was still alive for almost four years after that came out...
> >
> > if that surprises you ... "obviously you're not a golfer".
> >
> > don't get me started on how long we had to wait for clock_gettime().
> > that alone has to be responsible for half the macos #ifdefery on the
> > entire internet!
>
> The seven year time horizon does not apply to mac, because I haven't got the
> domain expertise.

well, the nice thing about mac users is that 90% of them will be
running the shiniest thing within 6 months. (and not just OS versions:
i've been amazed how quickly they upgraded to arm64 machines too.) the
trouble is how long it takes before Apple adds a thing.

> >> > ```
> >> > In file included from toys/posix/who.c:26:
> >> > In file included from ./toys.h:8:
> >> > ./generated/config.h:695:9: warning: 'CFG_TOYBOX_ZHELP' macro
> >> > redefined [-Wmacro-redefined]
> >> > #define CFG_TOYBOX_ZHELP 0
> >> >         ^
> >> > ./generated/config.h:691:9: note: previous definition is here
> >> > #define CFG_TOYBOX_ZHELP 1
> >> >         ^
> >>
> >> Emitted into config.h twice. Odd.
> >>
> >> The mac build I just did has just:
> >>
> >> #define CFG_TOYBOX_ZHELP 0
> >> #define USE_TOYBOX_ZHELP(...)
> >>
> >> The first of which is line 693.
> >>
> >> This is generated from the .config file via the giant "legacy compatible" sed
> >> invocation on line 161 of scripts/make.sh, looking for "CONFIG_BLAH=y" and
> >> "# CONFIG_BLAH is not set" lines, to produce the 0 and blank defines, or the 1
> >> and _VA_ARGS_ defines.
> >>
> >> Unless the sed hiccuped (unlikely), that says your .config has two instances of
> >> the same symbol. And given that they're emitted in pairs (CFG and USE macros for
> >> each symbol), there's another symbol between the redundant ones you've got. So
> >> you might have something like
> >>
> >> CONFIG_ZHELP=y
> >> CONFIG_WALRUS=y
> >> # CONFIG_ZHELP is not set
> >>
> >> In your .config? (Which I don't think the old 2.6.12 kconfig plumbing is
> >> _capable_ of emitting, but you might get if you manually patched your .config file?)
> >
> > yeah, that's almost certainly it. and, yes, luckily my /tmp is still
> > intact, so i can confirm that:
> > ```
> > /tmp/toybox-help$ grep ZHELP .config
> > CONFIG_TOYBOX_ZHELP=y
> > # CONFIG_TOYBOX_ZHELP is not set
> > /tmp/toybox-help$
> > ```
>
> See "always generating the headers", above.
>
> If the objection _is_ to compiling and running C code on the target, I've
> already been poking at moving help.h to sed because of the pending bug reports
> about that.
>
> While I'm there I'm tempted to strip the repeated "usage: $COMMAND " text out of
> the start of every single help entry, saving a dozen or so bytes times 200+
> commands, but then it doesn't show up right in the kconfig help.

(i'd be more interested in fixing the issues with repeated usage
lines/multiple toys that should share the same text --- they're much
more user-visible.)

> I wound up doing the gzip compression instead, because repeated text is the
> definition of compressible, but I still have the issue that shared
> implementations with the same help text (ala md5sum/sha1sum or chgrp/chown) have
> the same usage: line despite having different command names.

(heh, yeah, you beat me to it :-) )

> Properly fixing this involves replacing kconfig, which is on the todo list
> anyway but WAY TOO BIG a digression right now. (Finish shell, build LFS, THEN
> worry about it.)
>
> Rob


More information about the Toybox mailing list