[Toybox] [landley/mkroot] WIP: add gcc module (help wanted) (#3)

Rob Landley rob at landley.net
Wed Jul 4 19:16:15 PDT 2018


Since the github post is public I'm cc-ing my reply to the toybox mailing list,
for reasons explained in the body:

On 07/04/2018 03:38 PM, Carl Dong wrote:
> @landley <https://github.com/landley> I've been experimenting with mcm all day
> yesterday and have a preliminary module for it (that I can open a PR for as soon
> as you merge the checkout functionality).

I've been treating the musl-cross-make toolchains (cross and native) as build
dependencies of mkroot, I.E. already installed prerequisites.

You seem to want to put the toolchain build back under the mkroot build. That's
a design issue we need to work out.

> I'm quite new to this cross compiling thing, so I want to validate a few of my
> observations and assumptions on running |mcm-buildall.sh| so I don't go down the
> wrong path...

Way back when I wrote an "intro to cross compiling" that really should have been
called "why cross compiling sucks", but I was trying to be polite:

http://landley.net/writing/docs/cross-compiling.html

Then I did Aboriginal Linux, with the motto "we cross compile so you don't have
to", and wrote a big page of documentation there explaining what it was trying
to accomplish:

http://landley.net/aboriginal/about.html

(Before that page, I did training sessions based on
https://speakerdeck.com/landley/developing-for-non-x86-targets-using-qemu and if
you _really_ want the full context of what I was trying to do I reminisced at
http://landley.net/aboriginal/history.html .)

tl;dr the point of Aboriginal Linux was "simplest Linux system capable of
rebuilding itself from source code and building Linux From Scratch under the
result". I got it down to 7 packages: busybox, uClibc, linux, gcc, binutils,
make, and bash. But I did so much work extending busybox to replace the 20+ gnu
packages from LFS that I wound up maintaining that project for a bit.

Then I rebased to toybox and musl-libc (and looked for a replacement toolchain
for gcc when it went gplv3), but the main design change between aboriginal and
mkroot is that aboriginal built its own toolchain and mkroot does not.

By moving the toolchain build out to an external project somebody else
maintains, 2/3 of the complexity of aboriginal linux went away, and what was
left could be greatly simplified. (I hadn't done so before because nobody who
produced cross compilers was willing/able to produce _native_ compilers as well,
but Rich Felker was willing to be talked into it when he did mcm.)

Since doing mkroot, I've realized that mkroot doesn't really _need_ to be a
standalone project: I can merge the kernel module into the main mkroot.sh file,
merge it into the toybox repository, have it build the copy of toybox it's part
of, and point to kernel source with a command line argument or an environment
variable, so "kernel source" is an environmental prerequisite just like cross
compiler toolchain is.

Toybox needs a qemu-based bootable test environment to run root tests in its
test suite, automated regression testing on multiple targets is nice, and a
builtin simple root filesystem builder in a single file under 1000 lines of
shell script isn't a bad thing for toybox to have. Plus my 2013 toybox talk
(http://landley.net/talks/celf-2013.txt I.E. http://youtu.be/SGmtP5Lg_t0 ) was
about turning AOSP into a self-hosting development environment, and there's AOSP
build work to do there (breaking it into orthogonal layers, providing it with a
hermetic/reproducible build environment, etc). I designed mkroot with all those
goals in mind.

The resulting usage pattern might look something like:

  cd ~/dir
  git clone toybox
  git clone musl-cross-make
  git clone linux
  cd musl-cross-make
  ../toybox/scripts/mcm-buildall.sh
  cd ../toybox
  ln -s ../musl-cross-make/output mcm
  scripts/cross.sh all scripts/mkroot.sh LINUX=~/dir/linux NATIVE=y

(I'm still waffling on how musl-cross-make specific it should be. The "mcm"
symlink isn't an ideal UI. And NATIVE=y implies scripts/mkroot.sh in toybox
would also be aware of the mcm symlink and look for native compilers under it,
which seems wrong. Really that's more a "cross.sh -n" option setting
NATIVE_COMPILER to a path the same way it sets CROSS_COMPILE, and then _only_
cross.sh cares about that symlink. As I said, there's design work to do. :)

However, getting even that far implies that I:

A) add usable versions the two remaining busybox commands (route and sh) to
toybox, so I can yank the busybox download. (I'm not merging something into
toybox that depends on busybox.)

B) Add a "make" implementation to toybox (or convince musl-cross-make to build
it as part of their build, but android builds with LLVM and will never install
GPL tools into its image, so I need to write a new make anyway if the kernel
build depends on it.)

My limiting factor in all this has been lack of time: $DAYJOB eats all my
energy, no big company's wanted to sponsor me, and "take a year off and live off
my savings" is less compelling in one's 40s with a 6 figure mortgage and maybe
20 years to retirement than in one's 30s with a 5 figure mortgage and 30 years
to retirement.

> My observations:
> 
>  1. When we have a directory that says $ARCH-linux-musl-cross, that means the
>     gcc under this directory is an executable runnable on whatever architecture
>     the host compiler was (in mcm-buildall.sh's case, i686), that will in turn
>     produce executables runnable on $ARCH

Close: mcm-buildall.sh is actually currently hardwired to i686 host for the
cross compilers. (They run faster, it's sort of a poor man's x32.)

It's easy enough to change: two instances of the tuple in the script, plus the
i686-host.txt log name tee writes to, then move the new host arch to the start
of the list in the for loop at the end.

(I'd make it a variable you can set except for the part about moving the
appropriate static/native build to the start of the for loop. Alas the dynamic
-host toolchain has some architecture assumptions that easily confuse it, so we
do a proper static build with it and then use that for the other architectures.
Easy way to do that is built that target first. :)

I've made puppy eyes at Rich about taking mcm-buildall.sh into his
musl-cross-make repo (it's not really appropriate for mkroot, and full of
_exactly_ the kind of black magic I'm trying to foist off on him anyway), but
haven't done so _loudly_ yet. :)

>  2. When we have a directory that says $ARCH-linux-musl-native, that means the
>     gcc under this directory is an executable runnable on |$ARCH| that was
>     produced using $ARCH-linux-musl-native

It was produced using $ARCH-linux-musl-cross. It runs on target, and produces
binaries for the target. You should be able to extract that tarball on pretty
much any system and use it, just like you can with the cross compilers. (In fact
i686-linux-cross and i686-linux-native should be pretty similar.

In _practice_:

$ strace -F ./gcc --sysroot $(readlink -f ..) hello.c 2>&1 | grep stdio.h
[pid 29064] read(3, "#include <stdio.h>\n\nint main(int"..., 97) = 97
[pid 29064]
open("/home/landley/musl-cross-make/bin/i686-linux-musl-native/bin/../lib/gcc/i686-linux-musl/7.2.0/include/stdio.h",
O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_CLOEXEC|0x200000) = -1 ENOENT (No such file or
directory)
[pid 29064]
stat64("/home/landley/musl-cross-make/bin/i686-linux-musl-native/bin/../lib/gcc/i686-linux-musl/7.2.0/include/stdio.h.gch",
0xff9f9840) = -1 ENOENT (No such file or directory)
[pid 29064]
open("/home/landley/musl-cross-make/bin/i686-linux-musl-native/bin/../lib/gcc/i686-linux-musl/7.2.0/include/stdio.h",
O_RDONLY|O_NOCTTY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
[pid 29064] readv(4, [{"#include <stdio.h>\n\nint main(int"..., 4095},
{"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024}],
2) = 97
[pid 29064] writev(2, [{"", 0}, {"hello.c:1:10: fatal error: stdio"..., 102}],
2hello.c:1:10: fatal error: stdio.h: No such file or directory
 #include <stdio.h>

Looks like I need to make more puppy eyes at rich. I'm pretty sure this worked
at one point, and if I add "-I include" it still does. (By default it's only
searching the directory where the compiler headers provided by glibc are
installed, not the directory where the libc headers from musl are installed.)
And of course the resulting hello world only runs if I --static link it because
this isn't a musl host.)

> My mental model of |mcm-buildall.sh| is that it works like so:
> 
>   * Create i686-linux-musl bootstrap compiler linked against host libc
>       o Create i686-linux-musl-cross from parent
>           + Create i686-linux-musl-native from parent
>           + Create *-linux-musl-cross from parent
>               # Create *-linux-musl-native from parent
> 
> Is the above correct?

More or less, yes.

> Questions:
> 
>  1. Are both *-cross and *-native compilers portable and statically linked? As
>     in, can I copy them to a machine with their runnable architecture and just
>     run them?

Yes, modulo the header search path glitch I just noticed above.

(There's always some weird regressionw ith new gcc versions. This is probably
because I'm building 7.2 instead of 6.4. Back in aboriginal linux I had ccwrap.c
that parsed the gcc command line and rewrote it starting with --nostdinc
--nostdlib and then added back all the search paths manually, because it was the
ONLY WAY to beat gcc into submission. Rich has more faith in the gcc developers.
Or possibly more patience.)

>  2. For the "i686-linux-musl bootstrap compiler linked against host libc," does
>     this mean that this bootstrap compiler produces musl executables, BUT this
>     compiler itself was compiled using host libc?

Yes.

My old rant about the 6 paths and how a compiler is conceptually no different
from a docbook to pdf converter was recorded at a conference 10 years ago, at
starting almost exactly the 10 minute mark in
http://free-electrons.com/pub/video/2008/ols/ols2008-rob-landley-linux-compiler.ogg
. (There's probably a written version somewhere but I can't find it just now.)

The GCC developers have been insanely self-important forever, and do stuff
terribly. (That's why it's a rant.)

>  3. Why do we need the "i686-linux-musl bootstrap compiler linked against host
>     libc"? Why not go straight to "i686-linux-musl-cross"?

There's a reason I refer to it as my "compiler rant". The short answer is "the
gcc developers are insane".

>  4. If I only wanted one tuple (say x86_64), I could change the script to do:
> 
>   * Create x86_64-linux-musl bootstrap compiler linked against host libc
>       o Create x86_64-linux-musl-cross from parent
>           + Create x86_64-linux-musl-native from parent

In theory, yes.

(As long as the cross/native pair for the host is the first on you build, it
should work. If it's the only one you build, that's the first one. :)

Rob



More information about the Toybox mailing list