[Toybox] toybox build first-world problems

Rob Landley rob at landley.net
Sun Mar 5 11:01:36 PST 2023


On 3/4/23 14:03, enh via Toybox wrote:
> i don't think the DASHN stuff is working right? it's particularly
> noticeable on a MacBook (with 8 performance cores and 2 efficiency
> cores), but i see it on my desktop too (with 128 cores).

Sigh.

So "wait -n" is a "new" bash option (since the GPLv3 transition, so mac's old
bash hasn't got it) that waits for the "next" job that finishes. Without -n, you
either wait for all pending jobs to complete, or wait for a specific PID, and
doing FIFO on PID waiting leaves CPUs idle when there are short and long jobs.

I converted make to -n at the 7 year horizon, but added the -n check because
centos (with its 10 year horizon) broke. But I didn't put that MUCH effort into
it, I just made it not break. Alas, looks like mac falls in that bucket too.

My hope was I could eventually get toybox building under toysh and toybox sed
and so on, and ship a generated/build.sh snapshot that would just build an
absolute minimal "toybox with sh and sed and eventually make" so you could "make
setup" or something on platforms that were grumpy, but that (probably) comes
_after_ toybox building under mkroot which comes after "make tests" working
under mkroot...

8-core mac is a new use case. It probably needs to fall back to the old pre-n
behavior (like 80% efficiency on average on my 4x processor laptop I'd guess?)
instead just not using -n when "wait -n" errors (which has side effects).

> you see that initially $CPUS jobs are started, but anything over that
> is serialized.

Because wait -n and wait $PID each consume one pending job, but "wait" consumes
all pending jobs, so COUNT gets off. The easiest change would be to zero COUNT
in the no-argument case instead of decrementing it, but that doesn't fix the
OTHER problem.

The other problem is that "wait" with no arguments always returns zero, and thus
doesn't report the exiting process's error status. Meaning the build can't tell
anything broke before the link stage. Both the $PID and the -n versions report
errors correctly,

Sigh, I didn't really care about _properly_ fixing 10 year old centos. From
/etc/portability.sh:

# Probe number of available processors, and add one.
: ${CPUS:=$(($(nproc 2>/dev/null || sysctl -n hw.ncpu 2>/dev/null)+1))}

# Centos 7 bug workaround, EOL June 30 2024. TODO
DASHN=-n; wait -n 2>/dev/null; [ $? -eq 2 ] && unset DASHN

But mac needs a better fix. Hmmm...

> this is a lot less noticeable when you have 128 cores,
> because that's "most of defconfig" anyway. but when you only [world's
> tiniest violin] have 10 cores, it's really noticeable [even without
> that patch] the first 10 dots appear instantaneously, but the
> remaining ones appear one-by-one [which is what made me ask ps "what's
> actually going on?"]. there's basically just one compile job running
> at any given time after the first batch on the Mac. it's not _quite_
> serial though, because on the 128-core box i do see ~4 cc jobs at a
> time after the initial 128 job goldrush.

And that's a separate issue. I admit I have 4 cores here so wouldn't notice a
limit of more than that, but... what?

> so even if "macOS bash doesn't have -n so strict serialization is
> expected after the first few jobs" is intended,

only on centos 7, not on mac.

> i think there's wasted
> build time on linux too. (this doesn't affect Android's build system,
> of course, but this really stands out on a current Mac --- everything
> _else_ is super fast on these machines, but toybox builds are
> comically slow.

  $ time taskset 1 make clean defconfig toybox
  ...
  real	0m25.864s

Yes I need to fix it, but 26 seconds on a 10 year old laptop is "comically"?

> and it looks odd on a 128 core linux box too, as it
> screams out of the gate and then dribbles over the finish line :-) )
> 
> i'd poke deeper, but i don't even understand _why_ this is done in the
> shell rather than in make (even if the shell generates the relevant
> chunk of makefile), so i'm probably not the right person for it :-)

Multiple reasons, one is dependency reduction: should be able to build on a
system that hasn't got gnu/dammit make.

I'm poking deeper...

Rob


More information about the Toybox mailing list