[Toybox] [Nommu] Week ending June 27ish.

Sun Jun 28 16:15:26 PDT 2015

On 06/28/2015 05:06 PM, Rich Felker wrote:
> On Sun, Jun 28, 2015 at 03:57:34PM -0500, Rob Landley wrote:
>> On 06/27/2015 04:23 PM, Rich Felker wrote:
>>> On Sat, Jun 27, 2015 at 12:51:22PM -0500, Rob Landley wrote:
>>>> I should probably check in here at least weekly to give a general idea
>>>> of what I'm working on and plans going forward.
>>>>
>>>> I'm adjusting Aboriginal Linux to build for nommu targets, which means:
>>>>
>>>> 1) I yanked busybox ash and replaced it with a combination of hush and
>>>> bash. (This broke stuff, and I'm fixing it.)
>>>
>>> Does bash actually have nommu support?
>>
>> Sigh. You're right, it doesn't. (I thought my old 2.05b version did, but
>> nope.)
>>
>>> I asked about mksh a while back
>>> and sadly it doesn't. I've got hush working fine at the moment but I
>>> haven't tried much in the way of scripts, so I'm not sure what its
>>> coverage is like.
>>
>> Alas, hush is very limited.
>>
>> Sigh. Time to restart work on toysh I guess.
> 
> Are there perhaps any other shells that work on nommu? It would be
> nice to be able to put off toysh until you have time to do a really
> good job on it rather than rushing it because there's nothing else you
> can use...

Eh, I can do it in stages. And setting up another shell is effort (and
natural test/use cases) that's _not_ going into toysh development, so
I'd rather do it right than do other things that get thrown away.

That said, sure, http://www.cod5.org/archive/ links to es and pdksh for
example, and uclibc had like 5 of them (all kinda crappy if I recall).
None really an improvement on hush in terms of actually building stuff.

>>>> 2) I'm teaching toybox to probe for the existence of fork(), set a nommu
>>>> symbol if it's not there, and use sort of a fakeroot version of fork
>>>> (re-exec yourself with some way of transferring state to the child,
>>>> maybe an environment variable, maybe a pipe...).
>>>
>>> BTW toybox does a lot of expensive probing at startup already (before
>>> even entering the applet's main function) that makes simple commands
>>> take about twice the time of their busybox versions.
>>
>> Could you please be a little more vague?
>>
>> I'd prefer to respond to this on the toybox list, but since you felt it
>> should be asked here instead...
> 
> Sorry, it just came up as part of the overhead for self-exec topic. I
> agree the toybox list would have been a better place.
> 
>> Or do you mean something other by "probing" than "system calls"? (Memory
>> usage? CPU usage? Could I have some sort of metric here, or at least the
>> axis of interest?)
> 
> I meant system calls. I have:
> 
> # strace toybox true
> set_tid_address(0x15c9eeb4)             = 161
> rt_sigaction(SIGPIPE, {SIG_IGN, [], SA_RESTORER|SA_RESTART, 0x15c4cb48}, {SIG_DFL, [], 0}, 8) = 0
> getuid32()                              = 0
> geteuid32()                             = 0
> umask(0)                                = 0755
> umask(0755)                             = 0
> getuid32()                              = 0
> geteuid32()                             = 0
> umask(0)                                = 0755
> umask(0755)                             = 0

Ah, that's two instances of setup. The setup for toybox_main() and the
setup for true.

It can normally skip one of those, but you're explicitly calling
"toybox" so it treats it as a command and does setup for it.

> exit_group(0)                           = ?
> +++ exited with 0 +++
> 
> It looks like they're all repeated twice when the toybox command is
> used to invoke a command by name (rather than with symlinks).

Yes. If you call toybox as a command name, it gets the normal command
setup done for it. I'm not optimizing for that case. Pilot error.

I note that toysh already does manual init stuff because it has to
implement nofork and nothing else does. We know what context we're in
there, and I could optimize stuff. But since it's already dropped
priviliges in the suid case, we _have_ to re-exec to re-acquire that for
TOYBOX_STAYROOT commands. (I was thinking about this at the design level
when I implemented it...)

As I said, I can put a TOYBOX_UMASK flag check around the umask() calls
but I didn't bother because A) they're really trivial, B) toysh is an
example of a command that wants to use toys.old_umask without having set
TOYFLAG_UMASK in its newtoy(). (Admittedly it's re-applying the old
value because it's trying to do longjmp(rebound) cleanup for that nofork
stuff that I said probably isn't worth doing, so... I need to revisit
that and work out what tradeoffs I want to make. There's pending design
work to be done in toysh.)

> It's not
> as bad without the duplication, but that's still 4 unnecessary
> round-trips to kernelspace for a lot of commands. Maybe the get[e]uid
> stuff is hard to remove when the suid support is compiled-in, but
> getauxval() could be used instead on systems that have it.

There was talk a few years back of putting getuid and friends in vdso
and the result was nobody bothered because the overhead wasn't high
enough for anybody to care. (People tend to call gettimeofday() in tight
loops because it constantly changes.)

>>> Can you make the
>>> probe lazy (at first fork) or so it only happens in toys that need
>>> fork, rather than unconditionally at start?
>>
>> I don't understand the request here?  The vfork/exec probe I was
>> referring to is after exec, to see if toybox is re-execing itself. The
>> test _is_ to see if we're the first exec or not...
> 
> OK, I misunderstood then. I thought you were probing to see if you can
> fork, and otherwise switching to nommu mode.

Yes, at compile time.

> Obviously such a probe
> would be very expensive (relative to what you do need to do) if
> performed when you won't need it. That was probably the start of my
> misunderstanding.

No, it's a compile time probe to see if toysh can call fork() or has to
do something elaborately silly instead to make & and job control work in
toysh. (Let alone making cpio -p work, and the task pool stuff if I did
decide to have a compressor work in parallel or similar.)

>>>> I don't want to replace fork _entirely_ because fork takes about 5% as
>>>> long as exec on systems that _do_ support it, so the toybox shell being
>>>> able to fork and run commands internally is a big potential speedup for
>>>> shell scripts. But codepaths that are _not_ performance critical should
>>>> use the long way round so it gets testing.
>>>
>>> This seems like a good approach, but I have a suggestion that will
>>> perform even better: use pthread_create instead of fork for
>>> implementing shell builtins
>>
>> Suppressing the gag refex for involving pthreads in something for no
>> reason, I actually looked at that a long time ago (using clone()
>> directly to create shared process contexts), and didn't like it. Random
>> blog post (one of several over a multi-month period):
>>
>> http://landley.net/notes-2007.html#20-01-2007
> 
> The difference with using clone() directly is that __thread variables
> are not going to work and it might not even be safe to call any libc
> functions. glibc doesn't document what is or isn't safe; I could go
> into detail on the topic with musl if anyone cares to hear.

Feel free, but if threads need to do something non-obvious that clone()
doesn't, I don't want to get threading on me.

>> Commands that _can't_ run without fork() generally have a reason, such
>> as they don't free all their memory on error paths, they've changed
>> signal handlers, they may leave memory mappings, there may be opendir()
>> state, there could be exit handlers... And no, beefing up the error and
>> exit handling to make TOYBOX_CLEANUP_TO_HUMOR_VALGRIND be mandatory
>> doesn't fix it because ctrl-c can come in anywhere.
> 
> The obvious case where you need fork not to work around implementation
> limitations like the above, but for an actual semantic reason, is for
> pipelines and () subshells.

You can't intercept sigstop so you'd have to write some sort of filter
intercepting I/O and faking your own PTY to implement ctrl-z and I am
so, so, so not going there. Plus fg, bg, &.

> For many (but not all) such usages, if
> there is a builtin version of the commands involved, they could be
> implemented in threads without forking.

I did threading for about 5 years under OS/2 back in the 90's. I got the
desire to work with it out of my system.

> I could probably come up with
> some nice examples if you want to see them, but since the toys don't
> seem to be written to support this kind of usage, it's probably not
> practical anyway for toybox.

I can make a surprising number of things work if I decide to, but
convincing me opening that can of worms is a good idea in the first
place is a different matter.

One of my early toybox todo items was to parallelize bunzip2 (see
http://landley.net/notes-2007.html##26-12-2007 and commit d3236c1fd785
for example) and I did seriously look at -lpthread for this... and then
decide _so_ not to go there. (If I wanted to parallelize a compressor I
could use a task pool and pipes, but said compressor wouldn't be bunzip2
at this point because the algorithm's essentially deprecated. Gzip is a
streaming compressor, and lzma/xz are the Giant Horrible Algorithm to
make things as small as reasonably achievable. This leaves bunzip2
without much of a niche.)

I could probably actually parallelize gzip (compression of large data
sets is trivial if you don't mind inserting dictionary resets,
decompression is sort of heuristic but still doable, or decompression in
parallel could rely on the compression in parallel to determine the
reset stride... :)

>> I looked into having xexit() do a longjmp and then cleaning stuff up
>> reliably, inspecting our own heap and everything, and the answer is
>> reimplementing that much of the operating system is giant bloat and
>> complexity and a great big Not Going There.
> 
> I agree this is a bad approach -- things like that should be relegated
> to busybox and not carried into toybox.
> 
>> Probably not that easy, no. I've been poking at this for years:
>>
>> http://lists.busybox.net/pipermail/busybox/2006-March/053270.html
>> http://lists.busybox.net/pipermail/busybox/2009-January/068158.html
>> http://lists.busybox.net/pipermail/busybox/2011-February/074626.html
> 
> OK.
> 
>> And then I benchmarked it and noticed that exec is the expensive part by
>> an order of magnitude, and decided not to bother. Yes vfork() makes that
>> problematic again, but penalizing android to make nommu work would not
>> be my first choice, and exec() from a thread has always been
>> problematic. (Did this change recently?)
> 
> exec replaces the calling _process_, not thread, but it works fine if
> that's what you want to do.

Which means that a toysh that wanted to use threads for builtins would
need to use vfork for non-builtins meaning it would need two codepaths
to do largely the same thing meaning so not going there...

(I put two codepaths into "tail" because one can't work on pipes and the
other is pathological on large seekable files. I agonized about it, and
am still unhappy, and it's one of the few command config suboptions
remaining. That's about my threshold and comfort level for this sort of
thing.)

> Rich

Rob

 1435533326.0