<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sat, Jan 20, 2024 at 2:52 PM Rob Landley <<a href="mailto:rob@landley.net">rob@landley.net</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On 1/19/24 19:49, enh wrote:<br>
> On Fri, Jan 19, 2024 at 10:13 AM Rob Landley <<a href="mailto:rob@landley.net" target="_blank">rob@landley.net</a>> wrote:<br>
>> Creating the binaries isn't a big deal, it's just two .config files. I couldn't<br>
>> speak to the selinux labels and whatever $PATH changes pull in the second<br>
>> directory of symlinks on the android side.<br>
>><br>
>> I'm assuming the problem here is Android's policy of snapshotting the<br>
>> "generated" directory instead of allowing a shell script to call sed to<br>
>> regenerate the files. What is the actual policy/objection?<br>
> <br>
> it's that i don't want to duplicate your build system in our build<br>
> system. i _can_ write a "genrule" that calls things (including c<br>
> programs we've compiled), and do so in cases like<br>
> <a href="https://cs.android.com/android/platform/superproject/main/+/main:external/one-true-awk/Android.bp" rel="noreferrer" target="_blank">https://cs.android.com/android/platform/superproject/main/+/main:external/one-true-awk/Android.bp</a><br>
> say, but the toybox stuff is orders of magnitude more complicated than<br>
> that.<br>
<br>
Would it help if I pulled out "mkconfig.sh", "mkflags.sh", "mkglobals.sh",<br>
"mkhelp.sh", and "mknewtoys.sh" from make.sh and had the top level script call<br>
those?<br></blockquote><div><br></div><div>tbh, it the fact that stuff keeps moving around that makes it easier for me to just check in generated files. if/when it gets to the point where you haven't touched this stuff in a couple of years, _that's_ when it might make sense to move over :-)</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
I've been considering having them build in parallel anyway. The header<br>
generation didn't USED to be a build bottleneck, but it's grown over the years<br>
and SMP levels have increased, so...<br></blockquote><div><br></div><div>(yeah, given that the majority of toybox builds i'm waiting for are just regular host builds in /tmp, that's really noticeable.)</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
Four of the 6 headers are honestly just echo+sed invocations. And given even the<br>
config.h generation is using $SED (I.E. gsed) instead of "sed", I should just:<br>
<br>
sed 's/^# CONFIG_\(.*\) is not set.*/#define CFG_\1 0\n#define<br>
USE_\1(...)/;T;s/CONFIG_\(.*\)=y.*/#define CFG_\1 1\n#define USE_\1(...)<br>
__VA_ARGS__\n/;T;d' .config<br>
<br>
Sigh, all right:<br>
<br>
sed -e 's/^# CONFIG_\(.*\) is not set.*/#define CFG_\1 0\n#define USE_\1(...)/;T' \<br>
-e 's/CONFIG_\(.*\)=y.*/#define CFG_\1 1\n#define USE_\1(...) __VA_ARGS__\n/;T' \<br>
-e d .config<br>
<br>
Wordwrap wasn't kind to that in email, but I assume it's more generally legible<br>
to those who don't haven't implemented their own sed twice...<br></blockquote><div><br></div><div>(despite having looked it up last time i tried to understand this stuff, i still don't remember what T means. that's probably something mostly known to people who've implemented their own sed twice. i mean, BSD/macOS seds don't even know what it means :-) )</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
Weaning help.h off of C is something I've been working towards, because the<br>
design idea behind the current version was sub-options need to be stitched<br>
together, and now I'm going "maybe some sort of ${SUBOPT}" escape syntax to let<br>
one command know when/where to block copy in another command's help text?<br>
(Because the -Z aren't going away. They're not SELECTABLE, but they're THERE.)<br></blockquote><div><br></div><div>yeah, i'd wondered about that exact same idea. seems like it would help with the md5sum-type duplication too, if you could just "#include" another command's help in all the same-interface-different-name commands' help.</div><div><br></div><div>(fwiw, unless you're really anal about every last help byte -- which i don't think you are, plus you have compression now -- i personally quite like the coreutils option of just having -Z all the time, but on some systems it just prints an error message. similar to the old <a href="https://en.wikipedia.org/wiki/Bruce_Tognazzini">https://en.wikipedia.org/wiki/Bruce_Tognazzini</a> advice for GUIs about not hiding or even disabling invalid options --- have everything "doable" all the time, and explain to the user why it's not currently valid if they use it when [in most GUIs] it would have been greyed out.)</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
The design questions of what the escapes should look like</blockquote><div><br></div><div>heh, the reason i don't think i'd mentioned this idea to you was that i thought it would be less likely to end up a bikeshed ... i'm happy to pretend to have a strong opinion if it gets you out of the <a href="https://en.wikipedia.org/wiki/Buridan%27s_ass">https://en.wikipedia.org/wiki/Buridan%27s_ass</a> problem :-)</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> and whether to do it<br>
at build time or runtime remain unaddressed, but "don't require C at build time<br>
for this" is one of the design goals of that redo. (The gzip stuff shouldn't<br>
impact that because I can have a static "decompressed text" variable that gets<br>
populated so runtime can have a help4("command") that returns a null terminated<br>
char * without multiple decompressing; the theory is 1) you always exit right<br>
after showing help text, 2) you show help text INSTEAD of performing the normal<br>
operation of the command, so the memory bloat isn't a big deal even on embedded.<br>
(It's a high water mark, but a fixed size one.)<br>
<br>
Redoing mkflags with shell script is unlikely to happen soon, in part because<br>
that one DOES vary by .config. Although...<br>
<br>
#define FLAG_x ((FORCED_FLAG|CFG_COMMAND)<<1)<br>
<br>
Doing it that way would NOT vary (that part of) the header at compile time, and<br>
the compiler would still resolve it at compile time instead of runtime. And it's<br>
reasonably easy to generate because the COMMAND part is already used in the<br>
local block. (Modulo command sub-options, I still want dead code elimination to<br>
notice if FLAG(x) is constant zero for a disabled sub-option. So the CFG symbol<br>
should be for the sub-option, not the command, which means it needs to listen to<br>
the USE macros in that part of the string. Which if I'm doing it in shell means<br>
I need a sed that chops up each option string into individual lines of "string"<br>
and USE_BLAH("string") and then iterates through them back to front (pipe it<br>
through tac) and then... Hmmm.)<br>
<br>
Making the OPTSTR invariant is also tricksy, but I can pull out a very OLD trick<br>
which is that when I first added USE() macros to busybox<br>
(<a href="https://git.busybox.net/busybox/commit/?id=7bfa88f315d7" rel="noreferrer" target="_blank">https://git.busybox.net/busybox/commit/?id=7bfa88f315d7</a>) I also had SKIP()<br>
macros at the same time (well actually<br>
<a href="https://git.busybox.net/busybox/commit/?id=0d8766a3b13e" rel="noreferrer" target="_blank">https://git.busybox.net/busybox/commit/?id=0d8766a3b13e</a> and Denys objected to my<br>
names ala <a href="https://git.busybox.net/busybox/commit/?id=5e34ff29bcc8" rel="noreferrer" target="_blank">https://git.busybox.net/busybox/commit/?id=5e34ff29bcc8</a> but it DOES<br>
skip the contents of the macro when the config symbol is enabled...) So I can do<br>
a processed version of the symbols and an unprocessed version of the symbols,<br>
and then "xyz"USE_BLAH("abcde")SKIP_BLAH("^A^A^A^A^A")"fgh" and again it should<br>
resolve statically at compile time. And the processing on the sed output chunks<br>
would again be within the realm of what sed can do...<br>
<br>
Ahem, AFTER a release.<br>
<br>
>> My _theory_ is you don't want to compile external C code and run it on your<br>
>> build server for security reasons.<br>
> <br>
> no, there are rules against checking in "blobs" of any kind, but the<br>
> assumption is that imported source has been reviewed anyway. (that's<br>
> why two googlers have to +2 a change written and uploaded by an<br>
> outsider now --- you'd have to corrupt me _and_ someone else to get<br>
> your dodgy change in. it's also why only googlers can kick off a<br>
> presubmit build.)<br>
<br>
Ok. Makes sense.<br>
<br>
>> Which is understandable if so, but does that<br>
>> mean you patched all the "HOSTCC" calls out of the linux kernel build?<br>
> <br>
> unlike Android proper, which is no longer investigating bazel, the<br>
> kernel build fully switched to bazel, and doesn't use the upstream<br>
> build at all. (but there's a whole team working on the kernel, whereas<br>
<br>
Once upon a time, "cc *.c -o thingy' worked fine. Maybe a bit of setup, but you<br>
could end with "and now compile it all".<br>
<br>
And then "make" was invented as a premature optimization because "build all" was<br>
too slow, and they hadn't given Moore's Law time to work yet. Plus the whole<br>
<a href="https://en.wikipedia.org/wiki/Software_crisis" rel="noreferrer" target="_blank">https://en.wikipedia.org/wiki/Software_crisis</a> thing hadn't yet switched over to<br>
scripting languages that just run the source code without needing to compile<br>
anything, again because REPEATED EXPONENTIAL DOUBLING wasn't happening fast<br>
enough. (Sigh. Unix's small tools connected via pipes ALSO ADDRESSED THIS. It<br>
avoided compiling large monolithic software. The shell is where we got the name<br>
"scripting language" from.)<br>
<br>
Alas, when SMP was invented, the compiler did NOT get extended to automatically<br>
fork off sub-processes for each .c so "cc *.c -o thingy" would naturally take<br>
advantage of SMP. Instead they taught MAKE to do it, which was just wrong.<br></blockquote><div><br></div><div>(i'm not sure what part of "do the easy thing" and "unix" you don't think go together. and to be fair, i've seen a lot of compilers for several different languages try to move parallelism into the compiler with relatively little success. it's harder than it sounds, especially if you're expecting speedups anywhere close to what you get from the external parallelism.)</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
Along the way C++ happened, with templates that are literally Turing complete at<br>
compile time so compilation is technically never guaranteed to finish (yes they<br>
added the 17 level recursion limit to try to squelch that but last I checked you<br>
could still fill up the drive with your .o file and run a build that if that<br>
limit wasn't hit would outlast the sun, in a couple hundred bytes of C++ source.<br>
Ask Gerrit Kajmowicz about that, he knows where ALL the bodies are buried, yet<br>
somehow still thought C++ and CORBA were good ideas last I talked to him.)<br>
<br>
My BIGGEST disappointment with llvm is that they didn't teach "cc *.c" to use<br>
multiple processors. I planned to do it in qcc but I gave up trying to do a<br>
tinycc fork _and_ toybox _and_ mkroot all at once. (I'm no longer a teenager<br>
staying up all night in a bedroom, and while my goals still get described as<br>
"boiling the ocean" I can only do one ocean at a time if I want visible progress...)<br>
<br>
Anyway, tangent.<br>
<br>
If the Android kernel team could actually document what the kernel build NEEDS,<br>
and keep such a document up to date, then "building it" theoretically wouldn't<br>
be a big deal. I built toybox with a shell script in part to demonstrate that<br>
the build isn't a big deal: the Makefile at the top is literally just a wrapper<br>
to provide a UI around scripts/make.sh and scripts/install.sh and<br>
scripts/test.sh (modulo the pile of kconfig I need to replace with some kind of<br>
scripts/configure.sh).<br>
<br>
But alas I let scripts/make.sh get cluttered and have been meaning to tidy it up<br>
ever since. I didn't want to touch it too much because I didn't want to break<br>
whatever subset if it you're using, but if you're NOT using it then I don't have<br>
to worry about that. :)<br>
<br>
Speaking of which, generated/build.sh _is_ the "cc *.c" version of the build I<br>
was talking about above, preserved for people to reproduce on target systems<br>
that haven't got the right $PATH tools to do the full build.<br>
<br>
Looking at it again, I should probably peel out $FILES into a variable as well,<br>
so the last line is just:<br>
<br>
$BUILD main.c lib/*.c $FILES $LINK -o toybox<br>
<br>
In theory it should be "cc $CFLAGS main.c lib/*.c toys/*/*.c -o toybox" but<br>
cross compiling changes the cc name, and I can't use toys/*/*.c because some<br>
commands don't compile on some targets so we can't just "build it all, let the<br>
linker's garbage collection sort it out". Not without adding #ifdefs to the<br>
command implementations. And thus selecting the list of files based on the .config.<br>
<br>
(Including the sed/grep that determines what files to include in the "simple"<br>
build.sh defeats the purpose of the simple one not depending on the host having<br>
usable tools yet. The use case is bootstrapping toybox natively on a system that<br>
scraped up a toolchain but hasn't got a capable $PATH...)<br>
<br>
> toybox is just me on friday afternoons, but only when i don't have<br>
> anything more urgent.)<br>
<br>
I try not to make more work for you than necessary. :)<br>
<br>
>> I note that config.h is _always_ rebuilt from .config by scripts/make.sh<br>
>> (presumably overwriting your snapshot version)<br>
> <br>
> we don't run scripts/make.sh --- we build everything directly via soong.<br>
<br>
Which means I can change it up without worrying about breaking you.<br>
<br>
AFTER cutting a release...<br>
<br>
>> What would the two pools be, anyway? It seems reminiscent of the /bin vs /sbin<br>
>> split.<br>
> <br>
> yeah, that's been the point where i've always been unconvinced. (the<br>
> canonical example is people whining that having /bin/netcat makes life<br>
> easier for bad guys, which i've never really believed. saves the<br>
> _white hat_ folks a tiny amount of effort when writing their PoCs,<br>
> maybe, but the black hats? (a) i'd like to see the specific example<br>
> where one did that, please and (b) i'd like some evidence that<br>
> _that's_ an expensive part of a modern exploit chain, rather than<br>
> something the malware factories just have on the shelf anyway.)<br>
<br>
I'd vaguely assumed each app had a whitelist of IPs it's allowed to dial out to<br>
as part of its install metadata, but then I remembered you use IPv6 for everything.<br>
<br>
(Wikipedia[citation needed] blocked the whole ipv6 domain from doing anonymous<br>
edits because trying to whack-a-mole it was basically impossible due to the<br>
terrible design of ipv6. I actually went to some ipv6 design meetings back in<br>
the 1990s when they were taking RFCs, but my takeaway was "oh goddess this isn't<br>
salvageable, stay with IPv4 as long as humanly possible". I still tether my<br>
phone using "dhclient -d -4 usb0".)<br>
<br>
Sticking with IPv4 while still covering the whole planet's population is<br>
ENTIRELY DOABLE, by the way:<br>
<br>
<a href="https://www.mail-archive.com/cerowrt-devel@lists.bufferbloat.net/msg05838.html" rel="noreferrer" target="_blank">https://www.mail-archive.com/cerowrt-devel@lists.bufferbloat.net/msg05838.html</a><br>
<br>
But unfortunately the ipv6 people treat any attempt to make ipv4 better as an<br>
existential threat. Nobody voluntarily uses IPv6 while IPv4 remains an option.<br>
<br>
(4 billion addresses means if the average household size were 2.1 people every<br>
household could have its own static IP. and we're already masquerading behind<br>
household routers. Sticking with IPv4 is not technically impractical, just<br>
politically so. Back when I got my first web-enabled phone it _was_ using a<br>
masqueraded IPv4 address behind a virtual router, and there's still not a lot of<br>
phone servers. But alas, that ship has sailed...)<br>
<br>
> personally, i'd be happier with the "apps get _no_ /bin, shell gets<br>
> everything" option, but that probably requires a time machine given<br>
> the app compat issues.<br>
<br>
The amount of C code I've seen with system("rm filename"); and friends...<br>
<br>
That said, posix environments and shell scripting aren't a bad thing. :)<br>
<br>
>> > i assume. i don't actually have any idea, or any good way of knowing,<br>
>> > what apps are calling what toys.<br>
>><br>
>> I've done this already for system bootstrapping, mkroot/record-commands is a bit<br>
>> overkill for this, but the technique could presumably be scaled down to set a<br>
>> bit in a scoreboard or something. (I needed to know the command line so I could<br>
>> reproduce/debug behavior divergences, if you just want to know which files got<br>
>> execed...)<br>
> <br>
> oh, it's perfectly doable. but -- as you'd imagine and hope -- there's<br>
> a _lot_ of paperwork and legal signoff for anything like that, and i<br>
> don't think anyone's interested enough in the results to do that work.<br>
<br>
And I still want a posix container to do builds in, so I'm personally looking in<br>
the other direction. See also <a href="https://mstdn.jp/@landley/111763534546802525" rel="noreferrer" target="_blank">https://mstdn.jp/@landley/111763534546802525</a><br>
<br>
>> Or if I get the strong/weak symbol changes in, a wrapper around toy_singleinit()<br>
>> or similar could live in lib/portability.c and do extra setup before/after<br>
>> calling the original. Although the more logical thing to do THERE might be to<br>
>> have bionic's dynamic linker do it so you could log ALL executable launches.<br>
>> (Fire off a thread to record it and it shouldn't add measurable latency on an<br>
>> SMP system, plus exec isn't _that_ common and already fairly expensive as<br>
>> operations go. You zygote everything already to avoid it coming up much...)<br>
>><br>
>> > if i had my time again, i'd be<br>
>> > tempted to make everything in /bin only accessible to the shell,<br>
>> > because tbh most of what i've seen apps do is very stupid! although<br>
>> > there's selection bias there: "why would i even be looking at what an<br>
>> > app's doing if it isn't doing something wrong/stupid?".)<br>
>><br>
>> A more posix-like programming environment doesn't strike me as a bad thing, but<br>
>> I'm biased. :)<br>
> <br>
> me too. my only interest in "apps get nothing" would be minimizing the<br>
> app compat issues of _toybox_ changes. though "luckily" most of the<br>
> uses i've seen are stupid enough to be unlikely to be affected by any<br>
> plausible toybox behavioral/syntax changes.<br>
<br>
I'm trying to avoid breaking API changes, and also treating "delta between what<br>
debian does and what toybox does" as "thing to scrutinize and at least<br>
document". (And posix/toybox. And to a lesser extent busybox/toybox which would<br>
still surprise Alpine Linux users...)<br>
<br>
>> Debian not having /sbin in non-root users' $PATH is something I find personally<br>
>> annoying, but also a reasonably strong precedent for saying "these commands<br>
>> normal users are not expected to touch".<br>
>><br>
>> >> Which admittedly has a giant "apple(tm) version skew" warning in the middle but<br>
>> >> I honestly have no idea how to fix that: mknodat() is a posix-2008 function:<br>
>> >><br>
>> >> <a href="https://pubs.opengroup.org/onlinepubs/9699919799.2008edition/functions/mknodat.html" rel="noreferrer" target="_blank">https://pubs.opengroup.org/onlinepubs/9699919799.2008edition/functions/mknodat.html</a><br>
>> >><br>
>> >> Which Apple is now claiming it only added October of 2022?<br>
>> >><br>
>> >> <a href="https://en.wikipedia.org/wiki/MacOS_Ventura" rel="noreferrer" target="_blank">https://en.wikipedia.org/wiki/MacOS_Ventura</a><br>
>> >><br>
>> >> I mean... Really? They didn't catch up to posix-2008 for FIFTEEN YEARS? Steve<br>
>> >> Jobs was still alive for almost four years after that came out...<br>
>> ><br>
>> > if that surprises you ... "obviously you're not a golfer".<br>
>> ><br>
>> > don't get me started on how long we had to wait for clock_gettime().<br>
>> > that alone has to be responsible for half the macos #ifdefery on the<br>
>> > entire internet!<br>
>><br>
>> The seven year time horizon does not apply to mac, because I haven't got the<br>
>> domain expertise.<br>
> <br>
> well, the nice thing about mac users is that 90% of them will be<br>
> running the shiniest thing within 6 months. (and not just OS versions:<br>
> i've been amazed how quickly they upgraded to arm64 machines too.) the<br>
> trouble is how long it takes before Apple adds a thing.<br>
<br>
So their installed base is fickle and not tied to their existing infrastructure<br>
investment, is what you're saying. (Bitch about the people still running<br>
XP/kitkat all you want, but it means they're unlikely to switch to a competitor<br>
any time soon.)<br>
<br>
>> I wound up doing the gzip compression instead, because repeated text is the<br>
>> definition of compressible,<br>
<br>
The phrase for this is "avoidance productivity", by the way. Bog standard ADHD<br>
behavior...<br>
<br>
>> but I still have the issue that shared<br>
>> implementations with the same help text (ala md5sum/sha1sum or chgrp/chown) have<br>
>> the same usage: line despite having different command names.<br>
> <br>
> (heh, yeah, you beat me to it :-) )<br>
<br>
The hitch is generating it for --help without generating it for kconfig help.<br>
<br>
That said... if I took the command name out of the usage: line in the help text,<br>
I could still have the ls menuconfig help start with:<br>
<br>
usage: [-1ACFHLNRSUXZabcdfghilmnopqrstuwx] [--color[=auto]] [FILE...]<br>
<br>
And then:<br>
<br>
A) have the build break generating help.h if there's no usage: line for a config<br>
symbol that has a command_main() function.<br>
<br>
B) strip "usage:[ ]*" off the start when saving help string in the header.<br>
<br>
C) programmatically insert "usage: command " at the start of help text.<br>
<br>
This would make the kconfig help text slightly awkward, but not unreadable?<br>
<br>
Rob<br>
</blockquote></div></div>