[Toybox] Countering trusting trust.

Rob Landley rob at landley.net
Sat Dec 5 21:11:19 PST 2020


P.S. If you're wondering why I harp on this so much, and why I vanish from
toybox dev from time to time to do j-core stuff at $DAYJOB:

  https://pluralistic.net/2020/12/05/trusting-trust/#thompsons-devil

(And yes, our model involves decapping chips that come back from the fab to
confirm that what we sent them is what they produced, and yes this means doing
almost all of the low-level fab stuff with _our_ tools, and no the skywater 130
guys haven't even figured out what questions to ask in about 2/3 of the problem
space yet. And that's all _before_ you get into the design of the actually
secure parts...)

Rob

On 7/24/20 11:48 PM, Rob Landley wrote:
> This keeps coming up and I should have a writeup I can just point people at, so:
> 
> 15 years ago when I was maintaining Busybox somebody told me the big NORAD
> display at Cheyene Mountain (as recreated in the movie Wargames) ran busybox,
> which surprised me: I didn't think my code was good enough to defend the country
> from nuclear attack. But they explained they're required to audit every line of
> source for anything running on such highly secure systems, and they'd much
> rather audit a few hundred thousand lines of busybox code than tens of millions
> of lines of corresponding GNU code. This, I understood.
> 
> But it doesn't matter how secure your code is if it's running in a system that's
> already been compromised. The solution is to get a minimal secure base system,
> audit it (have experts read every line of source), and build up from there. At
> the root of any package management tree the dependencies go circular (everything
> depends on everything else), so there's a base set of packages you have to start
> with as a lump or nothing can run. These days, the minimal system to boot to a
> shell prompt is 3 packages (kernel, libc, and application: if you're bootling
> linux to a shell prompt your kernel is linux, your application is toybox, and
> your libc is probably either musl or bionic).
> 
> Of course auditing the output isn't enough because your development tools could
> have been compromised. Creating a new chroot from a machine that's running
> spyware is not very useful. So you make a tiny self-hosting system, which can
> rebuild itself from source code under itself. This is conceptually FOUR
> packages: the kernel libc and toybox above, plus a compiler toolchain (which CAN
> be a single package if you upgrade Fabrice Bellard's tinycc, as I proposed doing
> in my qcc project but have never found time to do).
> 
> My first implementation of this concept was aboriginal linux
> (https://landley.net/aboriginal/about.html) where I got the self-hosting system
> (capable of building Linux From Scratch under the result as proof it could
> natively bootstrap up to arbitrary complexity by downloading and compiling
> source code) down to 7 packages: the kernel was linux, libc was uclibc, the set
> of command line utilities was busybox, the toolchain was 2 packages (just gcc
> and binutils, it hadn't yet metastasized into 5 packages, gone gplv3, and
> rewritten itself in C++), and then I needed 2 more packages (make and bash)
> because the corresponding busybox commands were missing or not yet good enough.
> 
> My new one is based on mkroot (https://landley.net/toybox/faq.html#mkroot) with
> cross and native compilers from musl-cross-make (via scripts/mcm-buildall.sh in
> this source ala https://landley.net/toybox/faq.html#cross). Eventually I'd like
> to implement https://landley.net/qcc and get it down to the theoretical 4
> packages, but it's a work in progress and nobody ever wants to fund this stuff
> (ala https://elinux.org/CELF_Project_Proposal/Combine_tcg_with_tcc) so I can
> only throw scraps of hobby time/energy at it.
> 
> But then the NEXT step of paranoia is Ken Thompson's "trusting trust" attack,
> where the creator of unix modified the early BSD compiler to recognize and hack
> the login program (so the login binary contained an exploit the login.c source
> didn't, a hardwired "ken" account with a fixed password), and then he added a
> SECOND part so the compiler would recognize and hack itself (inserting the
> original exploit for login and the new one for cc) so now the COMPILER binary
> would contain an exploit even when wasn't in the compiler source. Then he
> removed the changes from the compiler source, rebuilt it with the modified
> binary to make sure the exploit propagated from compiler binary to compiler
> binary without being in the source code, and sent it to berkeley so he could
> always log into his students' system. Years later, when the ACM gave him a
> lifetime achievement award, he told this story:
> https://dl.acm.org/doi/pdf/10.1145/358198.358210
> 
> The first defense against this (presented in a PHD thesis
> https://dwheeler.com/trusting-trust/) is "countering trusting trust through
> diverse double compiling", I.E. compile your compiler's source with a DIFFERENT
> compiler, then rebuild it with the resulting output, to wash away any
> binary-only hacks that can't propagate through code they don't recognize.
> 
> But the only definitive defense is to audit the binaries of your minimal native
> development environment, not just their source. Due to the prevalence of viruses
> on windows, an entire industry of binary auditers have grown up reverse
> engineering exploit du jour, with companies like Veracode that employ them. Most
> of the good ones seem to be women, presumably because they've been guarding
> themselves from asshole men spiking their drinks for their entire careers, and
> seem to wind up specializing in security out of self defense. On twitter I
> followed @0xabad1dea @aloria @hacks4pancakes @malwareunicorn @fox0x01 and so on...
> 
> Presumably packages you add while "building up" only require source auditing,
> when they can be built from audited source using your reproducible environment
> of known good binaries.
> 
> This is why the minimal native development environment needs to be small,
> simple, and understandable. It needs to be maintainable, but it also needs to be
> auditable. It should require no external dependencies because they add to the
> pile of things that need not just _source_ auditing but _binary_ auditing. (And
> it can't be a one-time thing, you need to periodically re-audit it to make sure
> nobody's pulled something funny.) Having the entire minimal base system written
> in the same language helps simplify the auditing process, and since linux and
> most libc implementations are already written in C that was the logical language
> to write toybox in (before working that out I was considering lua). Tinycc is
> written in C and qcc (tinycc+qemu's tcg) should be written in C, there's a plan
> to use llvm-cbe as an improved cfront but how much a source audit of llvm-cbe's
> output of clang differs from a binary audit is an open question. And C was
> initially designed as a "portable assembly language", with reverse compilation
> tools and a flourishing community of reverse engineers that recreate lost source
> code for fun (https://www.youtube.com/watch?v=5tADL_fmsHQ).
> 
> One other note: if you can't reproduce it, what you're doing is not science. If
> you can't recreate an experiment from first principles under laboratory
> conditions, it's just alchemy. The ability to regularly reproduce the minimal
> native development environment and bootstrap your way up to arbitrary complexity
> in an automated fashion is an important regression test.
> 
> Oh, and the native builds I'm doing are architecture-agnositic: the build of the
> native system targets x86 or arm or superh or powerpc, and then the package
> builds within that system do the normal configure/make/install dance as native
> builds, by default not caring what architecture they're on. (That's the
> "portable" part of C being a "portable assembly language". Most scripting
> languages care even less, except for the #ifdef staircase in jit code generators...)
> 
> Rob
> 
> P.S. Doing the same for hardware is a whole second set of fun I'm working on
> over in the https://j-core.org side of things. You need open designs and open
> tools for netlist generation and place and route and such, and you have to fab
> on a fully open process with no black box libraries for pads or srams, you need
> to do the low-level layout yourself which most fabs won't give you the spec
> sheets for, and then you have to decap the chips when they come back to you and
> compare them with what you sent out, and even THEN there's some interesting
> papers on compromising chips purely through selective doping.
> 


More information about the Toybox mailing list