[Aboriginal] Using aboriginal as a bare-bones linux install

Mon Nov 2 23:53:34 PST 2015

On 11/02/2015 09:31 PM, Avery Payne wrote:
> 
> 
> On Sun, Nov 1, 2015 at 3:14 AM, Rob Landley <rob at landley.net
> <mailto:rob at landley.net>> wrote:
> 
>     Well, there's still some decisions:
> 
>     Does your "absolute minimum"
>     software include the toolchain, or just boot to a shell prompt?
> 
> 
> I want the toolchain.  In this case, aboriginal will be used as a way to
> create a blank-slate no-frills install that I can fit to my desires,
> minus the "vision" of a distribution maintainer.  Also, if something
> breaks, I get to keep both pieces.  :)

Aboriginal Linux has a vision, it's just based on knowing where we
_stop_ and hand off to other projects. :)

>     If it does should the packages be statically or dynamically linked?
> 
> The base install from aboriginal + the minimum rescue tools needed, all
> statically linked.  This means I can either (a) recover a botched boot
> or (b) rebuild something from source.  Either way the system can be
> fixed and set up-right with a bit of effort.

I should explain a design change I made a year or so back. (I think it's
in the release notes but should probably be in the FAQ or something.)

There used to be two ways to build aboriginal: with and without the
native compiler. The simple-root-filesystem.sh build was busybox+toybox
(plus it copies uClibc's shared libraries out of the cross compiler if
you were building anything dynamically, since those would be needed on
the target at runtime).

The native-compiler.sh step built gcc+binutils, uClibc, and three other
packages (make, bash, and distcc). This step did NOT run if you selected
NO_NATIVE_COMPILER=1.

Then a third script, root-filesystem.sh, would combine the two into a
single squashfs image. This only ran if you _didn't_ set
NO_NATIVE_COMPILER, otherwise the later stages would use
build/simple-root-filesytem directly. One of the things this script did
was skip the simple-root-filesystem shared libraries and replace them
with the native-compiler shared libraries. (They didn't _quite_ match
because they were built with slightly different options.)

Then I switched to using initramfs, put what used to be
simple-root-filesystem into initramfs (either built into the kernel
binary for SYSIMAGE_TYPE=initramfs or as a seperate cpio.gz file fed in
through the external initrd mechanism if you did SYSIMAGE_TYPE=cpio, the
second is the default because it's more useful to people who want to
build their own kernel with a different config after the fact, they can
reuse the cpio.gz image as-is.)

This meant that run-emulator.sh just ran the kernel and didn't mount
_any_ partitions (it ran from initramfs), and that dev-environment.sh
mounted the native compiler on /dev/hda and had to splice it into the
root filesystem at runtime (using a new init script at the top of the
native compiler squashfs image and a hook from /sbin/init to check for
and call it). (It didn't particularly change native-build.sh.)

So instead of selecting a "simple" or "full" environment at compile
time, this design change moved the selection to runtime. And made it
possible to run the minimal system on a board or emulator that had _no_
block devices. (Meaning you could set up a network mount to access the
native compiler if you wanted to, although you don't want to build on
nfs. Seriously, even samba is better than nfs for building. Virtfs
works. I added nbd-client to busybox and toybox because that works fine...)

However, if also means I don't ship a "unified" filesystem image
anymore. If you want the root-filesystem and native-compiler combined,
you have to do that yourself and package up the result. The
more/chroot-splice.sh script gives an example of combining them, but not
packaging them. And I believe that runs the script to create the various
symlinks to /usr/overlay (where the native-compiler filesystem gets
mounted), meaning it expects / to be writeable and runs the script at
runtime to write to it. If you wanted a combed hda squashfs like we used
to have, one _without_ a bunch of symlinks, you'd have to fiddle with it
a bit.

That's why I'm going "er..." when you way you want to avoid maintainer
design decisions. There are still several decisions I've made within my
highly circumscribed design goals. I know where the project _stops_ and
what I _don't_ do, but within what I do provide there's a bunch of ways
of packaging it:

1) Just root-filesystem or root-filesystem + native compiler.

2) Shipped on two partitions (initramfs and squashfs) or one. If using
initramfs, built into the kernel or loaded as initrd=cpio.gz? If one
partition, a read-only root filesystem or a writeable one?

3) Each package (busybox, toybox, gcc, binutils, make, bash, distcc)
statically or dynamically linked.

You seem mostly happy with my defaults (and I did choose them for a
reason), but when repackaging for use on something other than qemu, you
should be aware what the options are.

(I'm elaborating on this point because I don't feel I've documented it
sufficiently yet on the website. I should do that...)

> Beyond those two requirements, anything else fetched and built will
> probably be dynamic due to the laptop's limited 2Gb memory.
>  
>     Is your idea of minimal "busybox defconfig" (minimum number of
>     configuration
>     decisions) or the combination of busybox+toybox I've worked out as the
>     minimal set of commands needed to build linux from scratch (plus ones
>     like "ps" that you kinda miss if they're not there on the command line)?
> 
> More the later than the former.  The goal is to pull improvements in
> as-needed.  If the existing toybox/busybox command is adequate, then
> there's no need to build a replacement.  I probably want a full shell;
> possibly ash, but I'm open to others.

It's shipping hush an bash. One of the things that /usr/overlay/init
does when splicing in the native-compiler.sqf is delete the /bin/sh
symlink that points to hush and replace it with one that points to
/usr/overlay/bash.

Well, in the upcoming release anyway. Last release the initramfs hasn't
got a /bin/sh. (My bad. The switch to hush screwed up the config
slightly. In dev-environment.sh you didn't notice, but in
run-emulator.sh there wasn't a /bin/sh, just /bin/hush.)

> If ps is missing, I'll want that too.

Last release was using busybox ps, this upcoming release is using toybox
ps. Either way there is a ps. :)

> But the built-ins for toybox/busybox would have to be sorely
> lacking for me to replace them.

I take bug reports for the toybox commands, if there's something
specific you need it to do, let me know. (Busybox I mostly leave alone
these days, although I did just upgrade to the current busybox version
for this release.)

>     More pragmatically, packaging-wise do you want root-filesystem running
>     out of initramfs, or do you want it on the sd card filesystem too?
> 
> I have not had to deal with initramfs, root pivots, etc. but I have read
> briefly about them.  I need to think about this aspect a bit more.  I'm
> not wild about using a ramdisk - hard to debug, and all that - but if I
> need to, I will.

Initramfs is not a ramdisk. I wrote some documentation about initramfs
years ago which explained the difference between it and a ramdisk:

https://www.kernel.org/doc/Documentation/filesystems/ramfs-rootfs-initramfs.txt

Much later I added initmpfs (using tmpfs instead of ramfs) so you never
have to pivot off of it, it can continue to _be_ your real root
filesystem, as explained in the patch series introductory message:

http://lkml.iu.edu/hypermail/linux/kernel/1306.3/04204.html

Presumably I should document _this_ on the website too. :)

>     > I really, really want to be able to build aboriginal in such a way that
>     > it builds direct to a SSD, filesystem and all, and install a boot
>     > loader, and bamo-presto, my very own bare-bones Linux install.
> 
>     The system-image.sh stage builds the kernel and packages it into a
>     tarball, but there's nothing to say you can't add another script at the
>     end that does the above two cp to a target directory and adds your
>     bootloader.
> 
> 
> I could probably write the follow-on script you're talking about, minus
> the bootloader portion.  It shouldn't be too hard.  More like
> mind-numbing, trying to get everything copied to the right location with
> the right settings.

Or you could use the existing script as a template. :)

Seriously, most of it's cp -a. I made the layout of the overlay and the
layout of the root filesystem it goes on top of line up for a reason. :)

> Having an add-on script to automate all of this would be my ideal goal,
> as it means I could re-run the process anytime, wait about 15 minutes,
> and get a freshly-baked install on a physical hard drive.  I know that
> runs opposite of what you are trying to achieve with
> phones/tablets/whatever, but there's lots of used "obsolete" laptops out
> there that could use a 2nd chance and a good home...

Oh no, it's fine. Keep in mind I've been doing aboriginal linux for
about 15 years now, loooong predating the rise of smartphones.

> Given how busy I have been in my personal life lately, I'm not going to
> promise that this will happen.  But if I do manage to both carve out the
> time needed, and actually finish the script, I'll forward a copy to you.

Yay!

I may merge some stuff upstream if it helps. I note that the sh2eb
target in the repo is aimed at real hardware (the numato jcore boards)
rather than qemu, and previous things (such as the nail board target)
were also real hardware. QEMU is the default packaging choice, but by no
means the only one. (Mostly because _installing_ on real hardware and
dealing with bootloaders and such is a can of worms I didn't want to
open in this project, it's out of scope.)

>     For chroots I do those fixups manually, see more/setup-chroot.sh for
>     example.
> 
> Will do.
> 
>     You'll want a boot USB stick or similar standing by when you play with
>     bootloaders.
> 
> Good point; and noted.
> 
>     Really, make a $PATH that only has tinycc as the cc and try it and see
>     what breaks. That's what I did. (No binutils, no nothing. Aboriginal's
>     host-tools.sh and more/record-commands.sh and
>     more/report-recorded-commands.sh give you an example.)
> 
> If I can find the time, I'll try.  It would be interesting to see how
> far it will go, and what issues it uncovers.

Been there, done that. It ate about 3 years of my life and then I backed
out and did other things.

I'm vaguely pondering trying to do _just_ a preprocessor, and having
aboriginal's sources/toys/ccwrap.c call that instead of gcc -E, although
I have to teach ccwrap to be its own distcc first.

Backing up: distcc mode preprocesses every file before it sends it
across the network to the distccd compile server instance. It runs gcc
-E, copies the outputt across the network, copies the resulting .o file
back accross the network, and write sit to the local filesystem to be
linked locally later. This means that headers and libraries are all
resolved locally, and there's only one set of each, and only one type of
code being generated both locally and remotely, so this does _not_
reintroduce most of the complexity of cross compiling even though it's
using a cross compiler to do part of the work.

The failure mode of distcc is to _not_ distribute a lot of stuff that it
could easily distribute, for example any compile that doesn't have -c
won't get distributed, so compile-and-link happens natively inside the
emulator even though it would be faster to break it into two commands.
The thing is distcc doesn't have a very deep understanding of the gcc
command line. But ccwrap.c DOES have a deep understanding, it has to in
order to rewrite all the $PATH logic and take path searching decisions
away from gcc's broken insane code (which is its main job).

Meanwhile the normal gcc front-end is mostly just a wrapper: it calls
"cpp" to do preprocessing, calls "cc1" to do compiling, and calls either
collect2 or ld to do linking. In theory, I could expand ccwrap to
completely replace this front end and call the other tools directly.
(I'm over halfway there already, most of what ccwrap.c does is take
decisions AWAY from gcc, making them explicit.)

Anyway, I have a vague goal of teaching ccwrap to always call cpp (or
gcc -E) to preprocess a file and then feed that file to cc1 (or pipe it
directly between processes). This would let me break off the
preprocessor and treat it as an entirely seperate process, and then I
could use what I learned from tinycc to write a new preprocessor AND
compare its output against gcc -E's output for every single file
building the whole of linux from scratch. (Kind of like
record-commands.sh does now for command lines.) If nothing else, I know
this can run FASTER than gcc -E does because tinycc's preprocessor is
already way way faster than gcc's, and preprocesing is one of the big
bottlenecks of building under qemu with distcc. (CPU time in the
emulator is at much more of a premium than cpu time outside the
emulator: not only is emulated code maybe 1/5 the speed of native
execution on a good day, but QEMU runs on a single host processor (SMP
is emulated in software, emulating the semantics of barriers and IPI
between multiple threads on a different hardware architecture is
something the qemu guys just didn't feel they could do.)

Anyway, once I got a cpp I was happy with, then I could tackle the
linker. And if I got THAT done, I could poke at an assembler, objdump,
readelf, nm... And if I got all of those working, I could revisit the
actual compiler part of qcc. (Meanwhile, llvm+my linker and assembler
instead of binutils is a reasonable goal...)

But as I said, all this is _after_ toybox 1.0. And it competes with the
system bootstrapping stuff (natively building distros on an arbitrary
target under aboriginal). And toybox 1.0 isn't going to STOP demanding
time, it just goes down to a lower level (like aboriginal has: not
abandoned, just not demanding to the partial exclusion of all else...)

>     Busybox has wget and it's on the toybox roadmap. I also plan to add
>     rsync to toybox, and you can network mount filesystems
> 
> Wget will suffice to fetch a tarball.  From there I'm sure I can get
> builds going.  I was only looking at curl because of libcurl; I'm making
> a bad assumption that other software packages might actually use it (and
> reduce the install footprint in the process).

You might want to try taking apart the existing build control images.
The static-tools one is fairly simple, the lfs-bootstrap one has a lot
of common native build infrastructure shared between package builds.

>     > Is there a ftp client in toybox?
> 
>     Not yet, but there's one in busybox. ftpget/ftpput. The various
>     control-image files use that to upload results to the host.
> 
> Perfect!

The script to launch the host ftp sever is in native-build.sh by the way.

>     I note that 90% of the sites on the net you go to these days need https,
>     which means you'll need openssl built so toybox can call the openssl
>     comand line program to pipe the wget through. (Not implemented yet but
>     there was a message about it from... Isaac Dunham I think? posted to the
>     toybox mailing list a month or two back.)
> 
> Just how insane would I be to attempt to shim libressl in place of
> openssl, if only because of openssl's problems?

Presumably fairly straightforward?

"openssl s_client -quiet -connect" is basically encrypted netcat, that's
the part I need installed on the system for toybox wget to do https.
Then to implement wget I take the information from
http://podium.unsupported.io/#/ and
https://en.wikipedia.org/wiki/List_of_HTTP_header_fields and maybe
https://en.wikipedia.org/wiki/Server_Name_Indication and/or
https://en.wikipedia.org/wiki/HTTP/1.1_Upgrade_header ...

Anyway: todo item.

Rob