[Toybox] Back from japan, checked in some nommu stuff.

Sun Sep 27 17:02:44 PDT 2015

I checked in some nommu changes I've been wrestling with forever. It's
about 1/3 of the pending architectural stuff, but was one of the bits
blocking me figuring out how to get it to work right.

What I did was 1) replace toys.recursion with toys.stacktop, 2) add an
xvfork() wrapper that forces any later xexec() call to exec() rather
recursively calling toy_exec() in the same process.

So yanked the toys.recursion variable and replacing it with
toys.stacktop means I can do subtraction and approximate the amount of
stack used, and use that as a proxy for leaked heap and stuff. It's
currently forcing an exec when we hit 6k of stack used, which is small
but the default nommu stack size is 8k. This also lets me set
toys.stacktop to NULL to signal abnormal vfork-related states. (I played
around with setting toys.recursion to -1 but it was awkward.)

I also added an xvfork() to lib.h, which is a static inline because you
can't wrap vfork() with a normal function. (Because vfork's child shares
a stack with its parent, so if it returns from a function and then calls
another one the parent gets confused trying to return from _its_ version
of that function, the return pointer on the stack got stomped. Normally
vfork() produces an inline system call, not a call to a function in
libc. Possibly I should update nommu.org to explain that?)

But the reason I need to wrap vfork() is A) to catches failures (fork
can fail! Returning nonzero doesn't always mean you launched a child, -1
means we're out of PIDs!) and B) to zero out stacktop which tells
xexec() never to recurse. Once we vfork(), we need to exec() to unblock
the parent (and _until_ we do that, we're stomping on shared resources),
so xexec() needs to know we called vfork().

I'm not quite happy with this new infrastructure because:

A) some callers want to handle their own errors rather than
error_exit(), so they have to zero out toys.stacktop themselves. (But I
don't have a standard prefix for "I wrapped this function but not to
error_exit() on failure. This is actually a persistent problem I may
need to fix someday, but it balances against "how many wrappers do I
_need_...")

B) once a parent has zeroed stacktop it'll never recurse again. (Not
until you exec, anyway.)

The recursion is _mostly_ an optimization. When you fork and exec, the
exec is something like 95% of the overhead and fork is really cheap in
comparison. (Ok, there are pathological parents you can fork from where
fork itself gets expensive, but if toybox ever winds up behaving like
mozilla we've already failed.) However, the main time this comes up is
running shell scripts (lots and lots of commands so it adds up), and
toysh needs to do this manually rather than calling xexec() because it
needs to run nofork commands in the current process context (stuff like
"cd" or "export" that's a NOP if a child does it for you).

On the other hand, sometimes the $PATH hasn't got stuff, because you
threw a static toybox on a broken system or you just chrooted into a
container or some such. In which case recursing is the only way to call
other commands. (Example: mount may call losetup internally, and if
we're using that to set up a container...)

But I'm leaning towards not caring about that case because the _main_
user of it is "I dropped an exploit binary on a system and am gonna p0wn
it now" which... isn't interesting. If you boot from recovery media, you
can have an initramfs. If you're setting up a container, you can mount a
tmpfs and then umount it again so you have something to set your $PATH to.

Anyway, that's the stuff I've been working on. The remaining hard case
is when you have to re-exec _yourself_ (if you vfork, you have to exit
or exec to unblock the parent and stop stomping their stack and heap!),
and the test case I'm using is cpio's passthrough mode. In those cases,
I'm hijacking xpopen_both() so when you feed it a NULL for argv[] it
execs /proc/self/exe with the existing toys.argv.

I _was_ looking at "find your binary again even if proc isn't mounted!"
which meant preserving the /path/to/argv0 in toys.argv and making sure
we did the vfork() before we ever did a chdir and _still_ wouldn't work
if there was a chroot in there, and I basically went "screw it: in the
nommu support case, require /proc to be mounted to re-exec yourself. In
the with-mmu case, just use fork()."

So still banging on that. Sorry this has been blocking everything else
but when I've got pending patches to main() it's hard to test anything
else, and git is really annoying about pulling one tree into another
tree with changes because it will NEVER MERGE, it just says "you have
pending changes to this file which this pull will squash" and it being
way at the other end of the file means nothing...

Grrr. Working on it...

Rob

 1443398564.0