[Toybox] Linux From Scratch build.

Sun Apr 16 16:54:04 PDT 2023

On 3/24/23 12:11, Rob Landley wrote:
> Here's a script that automates the first part of the Linux From Scratch 11.3
> build (chapters 5 and 6, setting up the chroot) from
> https://linuxfromscratch.org/lfs/downloads/11.3/LFS-BOOK-11.3-NOCHUNKS.html
> using the source packages available in one big tarball at
> http://ftp.osuosl.org/pub/lfs/lfs-packages/lfs-packages-11.3.tar
> 
> Way back when https://landley.net/aboriginal/about.html drove my busybox
> development because I substituted in one busybox command at a time into the
> $PATH used by the build until it was providing everything. Each time I could
> compare the result (including each build's config.log file) to identify
> everywhere it diverged from the previous build and fix it so the result was the
> same.
> 
> Last time I converted the LFS build from glibc to uClibc first, and this time
> I'm doing musl, because glibc is just DEEPLY insane (and requires perl as a
> build prerequisite).
> 
> This doesn't replace the test suite, but will probably add a lot of entries to it.
> 
> One thing I'm currently frowning at is that earlier LFS builds (back when all
> this was chapter 5) put the entire temporary system in /tools to isolate it from
> the host (so you could rm -rf /tools when you were done and know nothing built
> on the host remained on the target0, but the new chapter 6 (still before the
> chroot) is writing its output to the rest of the chroot already? (Why are they
> doing _less_ isolation now?)

So I got asked off list about this, and thought I'd elaborate a bit more:

The plan is to start by automating the existing LFS x86 host build, so I have a
reproducible output where I know what success looks like. That's not just the
resulting binaries, that's also the log output: what decisions did each
package's configure and the build make along the way. (Alas, to make it
deterministic it has to be single processor, or else the order of the log
entries varies and it gets really hard to compare runs.)

Then I try to reproduce it in a container or vm, still with host tools. This
pretty much means "debootstrap", but that's a known thing I've done before. The
half-assed container setup and invocation is some variant of:

https://github.com/j-core/openlane-vhdl-build/blob/master/debootstrap.sh
https://github.com/j-core/openlane-vhdl-build/blob/master/launch.sh

And THEN what I do is A) convert the toolchain from glibc to musl. That, in
isolation, is likely to be a whole thing, but people have done it before:

  https://kanj.github.io/elfs/book/
  https://github.com/dslm4515/Musl-LFS
  https://github.com/dslm4515/CMLFS

I need to get comfortable with those (and document the delta for musl) before
proceeding. That produces a second "I now know what success looks like" milestone.

Then I use my record-commands script (which needs updating again, so it works
when the wrapper is already in the path without trying to build it) to get a
list of all the command lines called out of the $PATH, and turn that into a list
of commands with. For example digesting the command log from mkroot on the
mipsel target is:

$ toybox cut -DF 1 root/build/log/mipsel-commands.txt  | sort -u | \
  grep -v linux-musl- | xargs
as awk basename bison cat cc chmod cmp cp cpio cut date dirname echo egrep env
expr find flex getconf git grep gzip head ld ln ls make mkdir mktemp mv nproc
readlink realpath rm rmdir sed sh sha1sum sort tail tee touch tr true uname uniq
wc which whoami xargs yes

The full command log lets you know what the arguments to each command were so
you have a better chance of figuring out what happened when command behavior
differs. (Alas, commands run in a context so often you have to stop the build
and capture stdin/stdout and whatever files were listed on the command line to
be able to actually get the test case...)

Then once you've got that command list, add a /toybox dir at the start of the
$PATH and add one binary at a time to it, running the build and checking if the
behavior changed. (Keep in mind bash builtins are going to nerf a lot of this
until it's time to switch the shell over, so inserting "echo" and "true" is a
probably-going-to-work test without necessarily proving much just yet.)

Once you reach a certain point, you can try repotting to a chroot with less than
a full debian install in it, and instead cherry pick just the files you think
you still need. (Some of which may live in /etc, but most are probably $PATH
binaries.) Given that they're dynamically linked, the ldd loop I used to use to
copy binary-and-libraries into a chroot may be useful:

  https://github.com/landley/toybox/commit/e70126eabef8

The cherry-pick lets you know exactly what's left to be replaced, and eventually
you get it down to zero.

(Then there's the whole "making multiple targets work/build the same way" thing,
which you can get a head start on by using debian's qemu-debootstrap stuff...)

Rob