[mkroot] The repo is back on github.

Sun May 6 18:41:43 PDT 2018

On 05/06/2018 04:44 PM, Carl Dong wrote:
> Would love to see mkroot in a public repo again and to learn enough to
> contribute to it :-)

Last post was sort of a "depth" thing, talking about refining the design to
produce output with less input and plumbing.

But there's also a "breadth" part, where I'm trying to support as many different
architectures as I can. The latter's easier to find a part nobody else is
working on.

Ideally I'd like mkroot to support every target qemu has system emulations for,
and right now that's:

qemu-system-aarch64       qemu-system-mips64        qemu-system-s390x
qemu-system-alpha         qemu-system-mips64el      qemu-system-sh4
qemu-system-arm           qemu-system-mipsel        qemu-system-sh4eb
qemu-system-cris          qemu-system-moxie         qemu-system-sparc
qemu-system-i386          qemu-system-nios2         qemu-system-sparc64
qemu-system-lm32          qemu-system-or1k          qemu-system-tricore
qemu-system-m68k          qemu-system-or32          qemu-system-unicore32
qemu-system-microblaze    qemu-system-ppc           qemu-system-x86_64
qemu-system-microblazeel  qemu-system-ppc64         qemu-system-xtensa
qemu-system-mips          qemu-system-ppcemb        qemu-system-xtensaeb

For each architecture, we need a matching toolchain, kernel config, and qemu
invocation.

1) The toolchain has to produce executable output qemu can run, for the right
cpu and packaged into the right executable format. I've been using
musl-cross-make for this but you could use toolchains from anywhere as long as
they work. If you can get a statically linked "hello world" binary to run under
qemu application emulation, your toolchain's probably usable.

Application emulation is the qemu-$ARCH versions that don't have -system- in the
middle. Application emulation runs a userspace program and intercept and emulate
system calls ala strace. System emulation runs an operating system kernel and
intercepts/emulates attempts to talk to hardware I/O devices.

You can also run dynamically linked, the -L option is a prefix to the chroot
where your shared libraries live. Statically linked is easier though. Long ago I
submitted a patch to teach qemu to chroot into a directory,
https://lists.gnu.org/archive/html/qemu-devel/2008-03/msg00031.html but as with
so many things it never got merged and I moved on...

2) A kernel .config that matches a board QEMU knows how to emulate has to have
the right ARCH= (it's literally selecting a subdirectory name under arch/ in the
kernel source). It also has to select the right cpu type when there are multiple
options, the right board layout (possibly using device tree), the right
executable format (it should be ELF for mmu systems and fdpic or binflt for
nommu systems).

There are a bunch of defconfig files in the kernel source:

$ ls -d arch/*/configs
arch/arc/configs      arch/microblaze/configs  arch/s390/configs
arch/arm64/configs    arch/mips/configs        arch/sh/configs
arch/arm/configs      arch/nds32/configs       arch/sparc/configs
arch/c6x/configs      arch/nios2/configs       arch/um/configs
arch/h8300/configs    arch/openrisc/configs    arch/unicore32/configs
arch/hexagon/configs  arch/parisc/configs      arch/x86/configs
arch/ia64/configs     arch/powerpc/configs     arch/xtensa/configs
arch/m68k/configs     arch/riscv/configs

You go "make ARCH=sh microdev_devconfig" and that selects your .config.

(Should I do a kernel building walkthrough? I tried to put all this in my
simplest possible linux talk but I was sooooo jetlagged it was incoherent. Need
to do it again but I always feel silly resubmiting the same talk because I want
to do a better version.)

Then I usually digest it into a miniconfig and work out what subset of the
config options are actually necessary. I walked through that process at:

  https://landley.net/notes-2017.html#01-05-2017

(This works out the subset of config options I need to add to the modules/kernel
stanza for that architecture, but first I just build a kernel out seperately and
get it working. Note that the first 3 lines of each output/$ARCH/$ARCH.miniconf
file is the configure and make invocations, and the resulting file that qemu
expects to get as the -kernel argument, except it doesn't mention CROSS_COMPILE=
and I should fix that. You set CROSS_COMPILE= to the prefix of your cross
compiler, if it's "/usr/bin/sh4-linux-gcc" add
CROSS_COMPILE=/usr/bin/sh4-linux- as an argument to make to tell it to ue a
cross compiler instead of just "cc" out of the $PATH (if your cross compiler is
in your $PATH you don't need to specify an absolute path to it, but it's handy
to be able to if you're building lots of architectures and don't want to add ALL
the cross compiler sto your $PATH).

Or instead you can export it as an environment variable. You may see me do:

  CROSS_COMPILE=sh4-linux- make ARCH=sh

A lot, because that syntax is the shell exporting environment variables for just
this build (they become part of the child's environment, but not the parent's
environment). The ARCH=sh is a make override, where it's a make variable (rather
than an environment variable), and it's set read-only so the makefile's attempts
to override it would be ignored. But environment variables become default values
for make variables when make first runs, and make variables aren't automatically
exported to child processes (so make doing $(VARNAME) gets the command line
stuff automatically, but if it launches gcc and gcc looks for VARNAME in its
environemnt it won't see it unless the makefile explicitly exported it. Whereas
if you feed it in as an environment variable in the first place, those get
inherited by child processes because that's how environment variables work. So
sometimes the difference matters, but feeding CROSS_COMPILE to the kernel is not
one of those times, it explicitly re-exports it and it specifies _which_ child
processes to launch (which compiler tools to run) so make is the main consumer
of it anyway.)

</tangent>

3) QEMU invocation.

module/kernel has architecture specific qemu invocation bits, here's what arm64
is currently setting them to:

  QEMU="qemu-system-aarch64 -M virt -cpu cortex-a57"
  KARGS="console=ttyAMA0"
  VMLINUX=arch/arm/boot/zImage

Which plug into the generic:

echo "$QEMU -nographic -no-reboot -m 256" \
     "-append \"panic=1 HOST=$TARGET $KARGS\"" \
     "-kernel $(basename "$VMLINUX") -initrd ${CROSS_BASE}root.cpio.gz" \
     ${DTB:+-dtb "$(basename "$DTB")"} '"$@"'

The arch-specific $QEMU part is saying:

1) Which qemu-system binary to run. For arm64 qemu uses arm's preferred
aarrcchh6644 stutter, because arm's marketing department tries very very hard to
make sure to say arm is "the 64 bit architecture" (one and only, pay no
attention to that x86-64 behind the curtain, arch64 means _us_ and us only, oh
and we're not really arm anymore) at every opportunity. It's kind of sad,
really. I mock them when I can.

2) M is "machine", I.E. what board to emulate. Some boards can have multiple CPU
variants in them so -cpu says what processor variant to stick in this board.
(There's always a default, but sometimes it's not what we want and it can change
with new qemu versinos anyway. There's usually backwards compatability of the
"i686 could run i486 and i386 code" type, but sometimes we want to be specific
and want qemu to throw a fault if any newer instructions have leaked in because
we screwed up.

With most qemu options that take an argument you can pass in "help" and it'll
say what you can feed in here. So "qemu-system-aarch64 -M help" lists the board
emulations it's got. In this case we're using the "virtual" board that reads a
device tree and plugs in the devices the device tree says. It would be nice if
this was hooked up to more targets, but so far this is the only one.

(A device tree is a data structure listing the hardware available on the board
in a machine-readable format. Way back when the PC let you do BIOS callbacks to
query this, and the Solaris guys invented their own version called "open
firmware" (and as with any project with "open" in the title it was a single
vendor attempting to turn its proprietary technology into a de-facto standard,
it's one of those methinks the lady doth protest too much things). Later the
powerpc guys copied the solaris version (because IBM respected Sun's quarterly
financial statements, therefore their technology must be great if it was so
profitable, or some such). Later still the Linux guys genericized powerpc's copy
of the old solaris format, which is why the documentation for device tree is
https://www.kernel.org/doc/Documentation/devicetree/booting-without-of.txt

Meanwhile Intel did Itanic (Merced, Itanium, ia64, whatever they're calling it
this week) and were _embarassed_ to have 16 bit x86 assembly from 1979 booting
up the hardware at power on, so they pulled something out of their ass and
overcomplicated the HELL out of it and called it ACPI. And because the Linux
kernel guys licensed all the device tree format descriptions they'd done GPL,
preventing the BSD guys from freely using them, this let Intel and Microsoft
team up to pay AMD to port ACPI to arm a few years back. So we could have had a
clean victory here, but they lost by sticking copyleft where it didn't belong.
Bravo.

</tangent>

(There is _so_much_editing_ to wind up with coherent talks...)

Dropping down to the generic bit (the "echo" statement above), we have:

3) The -nographic option says we're emulating a "headless" system, I.E with no
graphics display. This option pulls the graphics hardware emulation (if any) and
instead hooks up qemu's stdin and stdout to the first serial device in the
emulated board, so you have a serial console. So the kernel boot messages go to
stdout, and you can type at the serial console from the command line you ran
qemu in.

4) The -no-reboot option says that if the emulated system tries to reboot, exit
qemu instead. (Turn it into a "halt".) This is generally what we want.

5) -m 256 says install 256 megabytes of memory into the virtual system. This is
enough to natively compile stuff. (The default's often lower, sometimes 64 megs.)

6) The -append option provides the kernel command line. Yes the kernel takes
command line arguments, just like userspace programs! That's a whole writeup in
itself, but the full list is at
https://www.kernel.org/doc/Documentation/admin-guide/kernel-parameters.txt

It's called "append" for historical reasons, it's generally the whole kernel
command line and qemu won't let you repeat the -append argument (the last one
wins). Yeah, I know, ask them not me:
https://lists.nongnu.org/archive/html/qemu-devel/2017-05/msg03214.html

The most important thing to specify here is console= specifying which serial
device the kernel's /dev/console should attach to. (We're doing it in the
arch-specific $KARGS.) Most targets won't default to a serial console even when
there are no other console devices available, they'll pine for a nonexistent
graphics card unless you SPECIFY this. (Maybe this'll get fixed someday, don't
hold your breath.) You can provide a /dev/ prefix if you like, it'll get chopped
off. 2/3 of the time this is "ttyS0" but the remaining third it's something
crazy and arch-specific. You can usually look up the serial device you've got
enabled in the config and look in its source file. The kernel config usually
needs a second config option to say "you know how this is a serial port? Well
let's _also_ allow this to be used as a console device!" I have no idea why this
isn't automatic. In this case, our kernel config has:

CONFIG_SERIAL_AMBA_PL011=y
CONFIG_SERIAL_AMBA_PL011_CONSOLE=y

So in the kernel source if we do:

$ find . -name 'Makefile*' | xargs grep SERIAL_AMBA_PL011
drivers/tty/serial/Makefile:obj-$(CONFIG_SERIAL_AMBA_PL011) += amba-pl011.o
$ grep '"tty' drivers/tty/serial/amba-pl011.c
	.name		= "ttyAMA",
	.driver_name		= "ttyAMA",
	.dev_name		= "ttyAMA",

And 0 is usually appended to indicate the first one. (You can also often fish it
out of the relevant defconfig, they may have a CONFIG_CMDLINE that provides
default kernel command line arguments ("grep ttyAMA arch/arm/configs/*" shows a
lot of ttyAMA0 on arm, I guess it's from an old Acorn machine and everybody
inherited the hardware?) And yes, qemu's -append usually completely replaces
these default command lines rather than appending to them. Ask the qemu devs
why, I can't get straight answers out of them ever since the IBM and Red Hat
enterprise/mainframe developers took over after Mentor Graphics bought Code
Sourcery and smashed the re-coalesced half of Cygnus that had fled Red Hat's
acquisition...

Historically people would also say root= here, which is still relevant sometimes
but you do _not_ need to do it for initramfs, but people have muscle memory and
can't stop themselves, and get really incensed when it doesn't do what they
expect: http://lkml.iu.edu/hypermail/linux/kernel/1801.3/05785.html

The "panic=1" argument tells the kernel that if it ever has a kernel panic
(spitting out a stack dump and all that), reboot 1 second later. we already told
qemu to exit if that happens (the -no-reboot option).

The HOST=blah thing sets an environment variable in the new system's first
process (PID 1). Any unrecognized kernel argument in "name=value" format will
fall through to set an environment variable in init.

7) The -kernel option triggers qemu's built-in bootloader. Qemu will load the
file into memory, tell it where the kernel command line arguments are (and its
external initrd file and device tree binary .dtb file if it has those) and then
start executing at the relevant start point, based on the format of what it
loaded. Alas, although qemu has an ELF loader capable of loading the vmlinux
file produced in the top directory of the kernel source, and even though that's
a generic format that should work on every single architecture, the qemu
developers haven't hooked it up on most architectures, and instead expect
various random architecture-specific formats. (Which is why we're copying zImage
and bzImage files out of subdirectories; no it's not gzip and bzip2, that would
be too simple. The b stands for "big" and once again is there for historical
reasons that no last mattered 20 years ago.)

8) The -initrd argument tells tells qemu where to find an archive (or filesystem
image) containing the initial ramdisk contents. It loads this blob of data into
memory and informs the -kernel bootloader it needs to feed it into the "here is
your initrd file" slot in the kernel's setup data before running the kernel.

The kernel examines this image and determines its file type about like the
"file" command does, and if it's a cpio archive (possibly compressed, in this
case with gzip) it extracts it into the initramfs filesystem (an instance of
ramfs or tmpfs). Then if there's an /init executable it runs that and PID 1
takes over and the kernel never gets to the fallback filesystem mount stuff that
would use the old root= kernel command line argument. (Which is how mkroot is
doing things, so that's the codepath of interest here.) See
https://www.kernel.org/doc/Documentation/filesystems/ramfs-rootfs-initramfs.txt
for the full story on that. (Maybe even the three part
http://landley.net/writing/rootfs-intro.html
http://landley.net/writing/rootfs-howto.html
http://landley.net/writing/rootfs-programming.html series I wrote way back when,
which I really really really should update for initmpfs. No I didn't write
initramfs, I just wrote the documentation on it because nobody had #*%(&%#
explained how it _worked_ when they wrote it, so I had to go digging and guess a
lot.)

Remember how a default kernel command line can be statically linked into the
kernel binary with that CONFIG_CMDLINE thing mentioned above? Well a static
initrd can be linked into the kernel too, at least the cpio.gz format for
initramfs can. (It's the CONFIG_INITRAMFS_SOURCE option in the kernel .config.
If you run "make ARCH=blah menuconfig" you can use the "/" key to search for a
config symbol by name, which says both where it lives in the menu and what its
dependencies are that enable it. Then navigate to where it says the symbol lives
in the menu (you may have to enable those prerequisites for it to be visible)
and pull up the "help" for it. No, I don't know why you can't pull up the help
from the / search page directly, ask the kernel guys.)

We're not statically linking in an initramfs archive in mkroot, but you see that
a lot. (If both static and external archives are supplied, the external one is
extracted over the internal one, replacing existing files in the internal one
with conflicting names. Although it's not a very smart extractor so if you try
to replace directories with a file of the same name, or vice versa, I expect
you'll either get error messages or silent failure. Haven't tried it recently.)

9) The dtb thing on the last line of the above "echo" stanza is in case you need
to specify a dtb (device tree binary, compiled from .dtc files which are ascii
human readable/editable device tree source except the c stands for "compiler"
because the command line tool to compile these is "dtc", don't ask me why it's
also the source extension).

That's one of those funky bash syntaxes for environment variable default values,
pull up the bash man page ("man bash") hit forward slash and search for
"parameter expansion" if you're curious. (It's the second hit in my copy,
forward slash and enter will repeat the last search.)

As with the kernel command line and initrd archive, you can statically link one
of these in to the kernel sometimes. Or the bootloader can provide one itself
(which is actually what qemu's -M "virt" board is doing). But I'm doing it for
the versatilepb board to demonstrate how it's done, and possibly because the
static way's broken on that target, I don't remember. (That was switched to use
device tree, and the non-device-tree board definition file deleted, a year or
two back. But qemu doesn't know or care about it, so we need to feed the kernel
a dtb file describing the emulated hardware. I expect I ranted about that in my
blog at some point... Back around https://landley.net/notes-2017.html#04-05-2017
or thereabouts. There's an in passing reference in the April 25 entry, I may
have gone into more detail somewhere around there...)

And then the "$@" at the end is just a shell "if they provided any command line
arguments, expand them here" thing. That way you can go:

  ./qemu-armv5l.sh -hda blah.img

So the scripts this is making let you feed extra arguments to qemu. :)

Rob