[Toybox] Any interest in patches to make the build process friendlier to hermetic builds?
Rob Landley
rob at landley.net
Tue Oct 15 23:13:48 PDT 2024
On 10/13/24 23:04, John Millikin wrote:
> On 2024-10-13 01:28, Rob Landley <rob at landley.net> wrote:
>> I need to write a FAQ entry about scripts/prereq/build.sh and maybe a
>> section of the README. It's regenerated each release by
>> scripts/recreate-prereq.sh (yes I'm checking in a generated file) and
>> the current documentation is the commit messages on the initial commit:
>
> That's a very useful pointer, thank you! The `scripts/prereq/generated/*.h'
> files were exactly what I was looking for.
Updating that is part of my release checklist.
(Yes the release is late. Between selling my house and moving, job
hunting that wound up with me going back to a previous employer, and an
overenthusiastic developer taking the fun out of reading the mailing
list, reinstalling my build environment, yet another round of covid, the
looming election... I seem to have burned out fairly hard this year.
Trying to spin back up to speed...)
>>> My specific proposal is to convert some or all of the $SED processing into
>>> C code and either put it in its own binary, or unify it with
>>> `scripts/mkflags.c' / `scripts/config2help.c' / etc.
>>
>> I was actually looking to get away from those and move it _more_ towards
>> sed. (And maybe awk now that we have one of those but I have to read
>> through it and promote it out of pending first.)
>>
>> That said, I'm working on replacing kconfig/* into a new
>> scripts/config.c so maybe it will inherit some of those functions, not
>> sure yet...
>
> Ah, in that case, would an acceptable middle ground be to move the sed
> invocations into a separate `.sh' script that can be invoked directly?
I tried to do that some years ago (make header generation a separate
script), and it complicated the build because both the header generation
and the compile part needed to populate the list of files and libraries.
(And no you can't just write them in to generated/ because the lifetimes
are wrong. Plus at best you're splitting 80% of the code off from the
remaining 20%, you're actually factoring out the _build_, not breaking
up the header generation. And reading multiple different files instead
of one didn't make what it was doing clearer, the only reason
portability.sh is a separate file is multiple things like single.sh and
install.sh read it. I was actually looking at folding genconfig.sh BACK
in because there's much less of it now that C11's __has_include() let me
move most of the probes it was doing to preprocessor directives...)
That said, the compile loop itself is fairly generic and might be of use
to other projects. And it only exists at all because "cc -j $(nproc)"
isn't recognized by compilers, which you'd THINK would be providing
their own SMP internally in 2024. Alas even _that_ needs the
scripts/portability.sh nonsense because MacOS hasn't got "nproc" and so
on, so it's not REALLY generic. It's got dependencies...
What would the advantage be? It's not _all_ sed invocations (despite
some cleanup passes recently ala 1e3708a91268 and 567f8daac6e7 and
d52e93c94784 and the various library probe redo commits). A relevant
blog entry would be https://landley.net/notes-2024.html#08-02-2024
It's still building and running mkflags.c and config2help.c because the
processing those do is a bit beyond what I could beat out of sed. The
help text processing probably gets lumped into the kconfig rewrite. The
flags thing is mostly about getting code to drop out when you switch off
config symbols: the build depends pretty heavily on the compiler doing
dead code elimination for if (0) {blah;} which even Turbo C for DOS did
fine (and the -ffunction-sections -fdata-sections -Wl,--gc-sections
compiler directives tell the linker to strip out unreferenced functions
and global variables from lib, so I don't have to #ifdef around them),
but it helps if FLAG(F) macros turn into 0 and that requires
USE_CFGSYM("blah") wrappers and some plumbing. Happy to do it in a more
elegant way if someone can think of one...
(And yes, MacOS compiler doesn't support --gc-sections despite it being
in gcc for TWENTY YEARS and llvm.ld supporting it from early on. The
problem is they don't use ELF, they use their own proprietary mach-o
object format, so they're stuck with a linker from the dawn of time.
They DO have a thing that does this, --dead-strip, it's just
gratuitously incompatibly renamed and is another reason portability.sh
exists...)
> For example, `scripts/make.sh' currently contains this code to produce
> `generated/newtoys.h':
>
> if isnewer newtoys.h toys
> then
> # The multiplexer is the first element in the array
> echo "USE_TOYBOX(NEWTOY(toybox, 0, TOYFLAG_STAYROOT|TOYFLAG_NOHELP))" \
> > "$GENDIR"/newtoys.h
> # Sort rest by name for binary search (copy name to front, sort,
> remove copy)
> $SED -n 's/^\(USE_[^(]*(.*TOY(\)\([^,]*\)\(,.*\)/\2 \1\2\3/p' toys/*/*.c \
> | sort -s -k 1,1 | $SED 's/[^ ]* //' >> "$GENDIR"/newtoys.h
> [ $? -ne 0 ] && exit 1
> fi
>
> It's difficult to get there without having the rest of `make.sh' tag along
> (the hostcmp and environment probing), but if the code were adjusted to
> something like this:
>
> # make-generated.sh
> gen_newtoys_h() {
> # The multiplexer is the first element in the array
> echo "USE_TOYBOX(NEWTOY(toybox, 0, TOYFLAG_STAYROOT|TOYFLAG_NOHELP))" \
> > "$GENDIR"/newtoys.h
> # Sort rest by name for binary search (copy name to front, sort,
> remove copy)
> $SED -n 's/^\(USE_[^(]*(.*TOY(\)\([^,]*\)\(,.*\)/\2 \1\2\3/p' toys/*/*.c \
> | sort -s -k 1,1 | $SED 's/[^ ]* //' >> "$GENDIR"/newtoys.h
> [ $? -ne 0 ] && exit 1
> }
>
> # scripts/make.sh
> source scripts/make-generated.sh
> # [...]
> if isnewer newtoys.h toys
> then
> gen_newtoys_h
> fi
>
> This would let a hermetic build system handle the C compilation of the helper
> tools, then call into `make-generated.sh' for the sedding.
So you propose building a parallel build system that sources subsets of
my scripts, broken down into yet more files that import each other. And
both files hardwire the name "newtoys.h", so farther from the general
"single point of truth" concept...
The only advantage I can see here is granularity: you want to be able to
reproduce some scripts but not others. Is there a downside to creating
them all and cherry picking what you need? Or if some don't build at all
in a given environment, anything doing isnewer can be skipped via "touch".
I'm all for simplifying the build, but when I figure out how I tend to
do it. Making gears that need to interlock for the benefit of
environments I will never personally regression test isn't always
simplifying...
>> Alas the one thing the build still needs is /bin/bash because toysh
>> isn't quite ready yet. I'm working on that too, but this year's kind of
>> gotten away from me. (I sold my house and moved, my wife graduated and
>> got a full time job, I went back to work for the j-core guys...)
>
> I did some light testing and found that the generated code portions of
> `scripts/make.sh' are mostly portable. There were two minor Bash-isms that
> were easy to replace with POSIX equivalents,
Mostly because macos is using an ancient version of bash (last GPLv2
release, from 2007) which doesn't understand things like wait -n and
thus it has a probe and workaround.
> and I could successfully run the
> build using either dash-0.5.12
Sigh, breaking the Defective Annoying SHell strikes me as a bonus but
I'm biased there.
http://lists.landley.net/pipermail/toybox-landley.net/2020-March/027641.html
> or mksh-R59c (which are both much easier to
> build in an isolated chroot than Bash).
Android's using mksh, the test plumbing at least gets a workout over
there. In the test suite, I apply patches removing bashisms because I
haven't finished toysh yet and they have to run with mksh on device. The
build does not yet run on device (but I'm working on that, including
building the kernel).
Personally, I'd really like to finish toysh, but seen "burnout" above.
(And the whole red queen's race thing where I really don't WANT to
maintain kernel rust removal patches the way I did perl removal patches,
and wasn't planning to implement my own crypt() but glibc decided posix
schmozix they were yanking it, and upgrading from devuan bronchitis to
devuan dermatitis broke a bunch of "TEST_HOST=1 make tests" I still
haven't entirely sorted through...)
> Do you have any interest in patches to make `scripts/make.sh' (and/or an
> extracted `make-generated.sh') POSIX-compatible-er?
I'm writing a bash compatible shell. Dogfooding it (the toybox build
working under toybox's shell) is probably my goal for a 0.9 release.
(0.8.twodigits is _embarassing, and doesn't sort right in directories).
Removing bash-isms I intend to implement just makes me go "I need to do
more shell work". That said, I can see an argument for running the build
under mksh. What specific features is mksh missing?
Rob
More information about the Toybox
mailing list