[Toybox] Any interest in patches to make the build process friendlier to hermetic builds?

Rob Landley rob at landley.net
Tue Oct 15 23:13:48 PDT 2024


On 10/13/24 23:04, John Millikin wrote:
> On 2024-10-13 01:28, Rob Landley <rob at landley.net> wrote:
>> I need to write a FAQ entry about scripts/prereq/build.sh and maybe a
>> section of the README. It's regenerated each release by
>> scripts/recreate-prereq.sh (yes I'm checking in a generated file) and
>> the current documentation is the commit messages on the initial commit:
> 
> That's a very useful pointer, thank you! The `scripts/prereq/generated/*.h'
> files were exactly what I was looking for.

Updating that is part of my release checklist.

(Yes the release is late. Between selling my house and moving, job 
hunting that wound up with me going back to a previous employer, and an 
overenthusiastic developer taking the fun out of reading the mailing 
list, reinstalling my build environment, yet another round of covid, the 
looming election... I seem to have burned out fairly hard this year. 
Trying to spin back up to speed...)

>>> My specific proposal is to convert some or all of the $SED processing into
>>> C code and either put it in its own binary, or unify it with
>>> `scripts/mkflags.c' / `scripts/config2help.c' / etc.
>>
>> I was actually looking to get away from those and move it _more_ towards
>> sed. (And maybe awk now that we have one of those but I have to read
>> through it and promote it out of pending first.)
>>
>> That said, I'm working on replacing kconfig/* into a new
>> scripts/config.c so maybe it will inherit some of those functions, not
>> sure yet...
> 
> Ah, in that case, would an acceptable middle ground be to move the sed
> invocations into a separate `.sh' script that can be invoked directly?

I tried to do that some years ago (make header generation a separate 
script), and it complicated the build because both the header generation 
and the compile part needed to populate the list of files and libraries. 
(And no you can't just write them in to generated/ because the lifetimes 
are wrong. Plus at best you're splitting 80% of the code off from the 
remaining 20%, you're actually factoring out the _build_, not breaking 
up the header generation. And reading multiple different files instead 
of one didn't make what it was doing clearer, the only reason 
portability.sh is a separate file is multiple things like single.sh and 
install.sh read it. I was actually looking at folding genconfig.sh BACK 
in because there's much less of it now that C11's __has_include() let me 
move most of the probes it was doing to preprocessor directives...)

That said, the compile loop itself is fairly generic and might be of use 
to other projects. And it only exists at all because "cc -j $(nproc)" 
isn't recognized by compilers, which you'd THINK would be providing 
their own SMP internally in 2024. Alas even _that_ needs the 
scripts/portability.sh nonsense because MacOS hasn't got "nproc" and so 
on, so it's not REALLY generic. It's got dependencies...

What would the advantage be? It's not _all_ sed invocations (despite 
some cleanup passes recently ala 1e3708a91268 and 567f8daac6e7 and 
d52e93c94784 and the various library probe redo commits). A relevant 
blog entry would be https://landley.net/notes-2024.html#08-02-2024

It's still building and running mkflags.c and config2help.c because the 
processing those do is a bit beyond what I could beat out of sed. The 
help text processing probably gets lumped into the kconfig rewrite. The 
flags thing is mostly about getting code to drop out when you switch off 
config symbols: the build depends pretty heavily on the compiler doing 
dead code elimination for if (0) {blah;} which even Turbo C for DOS did 
fine (and the -ffunction-sections -fdata-sections -Wl,--gc-sections 
compiler directives tell the linker to strip out unreferenced functions 
and global variables from lib, so I don't have to #ifdef around them), 
but it helps if FLAG(F) macros turn into 0 and that requires 
USE_CFGSYM("blah") wrappers and some plumbing. Happy to do it in a more 
elegant way if someone can think of one...

(And yes, MacOS compiler doesn't support --gc-sections despite it being 
in gcc for TWENTY YEARS and llvm.ld supporting it from early on. The 
problem is they don't use ELF, they use their own proprietary mach-o 
object format, so they're stuck with a linker from the dawn of time. 
They DO have a thing that does this, --dead-strip, it's just 
gratuitously incompatibly renamed and is another reason portability.sh 
exists...)

> For example, `scripts/make.sh' currently contains this code to produce
> `generated/newtoys.h':
> 
>      if isnewer newtoys.h toys
>      then
>        # The multiplexer is the first element in the array
>        echo "USE_TOYBOX(NEWTOY(toybox, 0, TOYFLAG_STAYROOT|TOYFLAG_NOHELP))" \
>          > "$GENDIR"/newtoys.h
>        # Sort rest by name for binary search (copy name to front, sort,
> remove copy)
>        $SED -n 's/^\(USE_[^(]*(.*TOY(\)\([^,]*\)\(,.*\)/\2 \1\2\3/p' toys/*/*.c \
>          | sort -s -k 1,1 | $SED 's/[^ ]* //'  >> "$GENDIR"/newtoys.h
>        [ $? -ne 0 ] && exit 1
>      fi
> 
> It's difficult to get there without having the rest of `make.sh' tag along
> (the hostcmp and environment probing), but if the code were adjusted to
> something like this:
> 
>      # make-generated.sh
>      gen_newtoys_h() {
>        # The multiplexer is the first element in the array
>        echo "USE_TOYBOX(NEWTOY(toybox, 0, TOYFLAG_STAYROOT|TOYFLAG_NOHELP))" \
>          > "$GENDIR"/newtoys.h
>        # Sort rest by name for binary search (copy name to front, sort,
> remove copy)
>        $SED -n 's/^\(USE_[^(]*(.*TOY(\)\([^,]*\)\(,.*\)/\2 \1\2\3/p' toys/*/*.c \
>          | sort -s -k 1,1 | $SED 's/[^ ]* //'  >> "$GENDIR"/newtoys.h
>        [ $? -ne 0 ] && exit 1
>      }
> 
>      # scripts/make.sh
>      source scripts/make-generated.sh
>      # [...]
>      if isnewer newtoys.h toys
>      then
>        gen_newtoys_h
>      fi
> 
> This would let a hermetic build system handle the C compilation of the helper
> tools, then call into `make-generated.sh' for the sedding.

So you propose building a parallel build system that sources subsets of 
my scripts, broken down into yet more files that import each other. And 
both files hardwire the name "newtoys.h", so farther from the general 
"single point of truth" concept...

The only advantage I can see here is granularity: you want to be able to 
reproduce some scripts but not others. Is there a downside to creating 
them all and cherry picking what you need? Or if some don't build at all 
in a given environment, anything doing isnewer can be skipped via "touch".

I'm all for simplifying the build, but when I figure out how I tend to 
do it. Making gears that need to interlock for the benefit of 
environments I will never personally regression test isn't always 
simplifying...

>> Alas the one thing the build still needs is /bin/bash because toysh
>> isn't quite ready yet. I'm working on that too, but this year's kind of
>> gotten away from me. (I sold my house and moved, my wife graduated and
>> got a full time job, I went back to work for the j-core guys...)
> 
> I did some light testing and found that the generated code portions of
> `scripts/make.sh' are mostly portable. There were two minor Bash-isms that
> were easy to replace with POSIX equivalents,

Mostly because macos is using an ancient version of bash (last GPLv2 
release, from 2007) which doesn't understand things like wait -n and 
thus it has a probe and workaround.

> and I could successfully run the
> build using either dash-0.5.12

Sigh, breaking the Defective Annoying SHell strikes me as a bonus but 
I'm biased there.

http://lists.landley.net/pipermail/toybox-landley.net/2020-March/027641.html

> or mksh-R59c (which are both much easier to
> build in an isolated chroot than Bash).

Android's using mksh, the test plumbing at least gets a workout over 
there. In the test suite, I apply patches removing bashisms because I 
haven't finished toysh yet and they have to run with mksh on device. The 
build does not yet run on device (but I'm working on that, including 
building the kernel).

Personally, I'd really like to finish toysh, but seen "burnout" above. 
(And the whole red queen's race thing where I really don't WANT to 
maintain kernel rust removal patches the way I did perl removal patches, 
and wasn't planning to implement my own crypt() but glibc decided posix 
schmozix they were yanking it, and upgrading from devuan bronchitis to 
devuan dermatitis broke a bunch of "TEST_HOST=1 make tests" I still 
haven't entirely sorted through...)

> Do you have any interest in patches to make `scripts/make.sh' (and/or an
> extracted `make-generated.sh') POSIX-compatible-er?

I'm writing a bash compatible shell. Dogfooding it (the toybox build 
working under toybox's shell) is probably my goal for a 0.9 release. 
(0.8.twodigits is _embarassing, and doesn't sort right in directories).

Removing bash-isms I intend to implement just makes me go "I need to do 
more shell work". That said, I can see an argument for running the build 
under mksh. What specific features is mksh missing?

Rob


More information about the Toybox mailing list