[Toybox] toybox build first-world problems

Tue Mar 7 14:45:16 PST 2023

On 3/7/23 10:46, enh wrote:
> On Mon, Mar 6, 2023 at 8:15 PM Rob Landley <rob at landley.net> wrote:
>>
>> On 3/6/23 11:43, enh wrote:
>> > (is there a reason this stuff isn't just `| xargs -P` though? macOS'
>> > xargs does support -P, as does toybox's. might be worth a comment if
>> > so.)
>>
>> I need to fetch the error return codes to see if any of the parallel build
>> instances errored out, and the easy way to do that is for them to be shell jobs
>> that I query the status of via wait.
> 
> ah, right. that makes sense. (at least if you ignore the fact that we
> can't be the first people to want to run a bunch of jobs _and_ check
> whether they succeeded or not!)

wait $PID existed for a reason an wait -n was added for a reason. Also,
widespread availability of SMP only dates back to 2005 so serialization didn't
have a significant penalty until recent-ish-ly. (Beowulf clusters were something
like 1997, but that's a whole different programming model.)

If I assume the shell has wait -n then it's not even that hard. Something like:

  # Launch background jobs with rate limiting
  for i in $STUFF
  do
    launch $i &
    ((++COUNT>$CPUS)) && { ((--COUNT)); wait -n || { ERR=1; break; } }
  done
  # Wait for pending jobs
  while ((COUNT-->0)); do wait -n || ERR=1; done
  # Were they all happy
  [ -n "$ERR" ] && die "something had an error"

That's off the top of my head. The complication is supporting the old "wait's
default behavior is stupid and manually tracking PIDs is tedious" codepath. The
ratelimit() function is about backwards compatibility, wouldn't need it otherwise.

(Ok, technically there's a complication which is that bash doesn't clear its
background job table until you do a wait without arguments, which means if you
"wait $PID" multiple times with the same PID you get the same results meaning
it's got the job equivalent of zombies accumulating. But I don't think it's
actual OS level zombie processes just internal data structures. Don't ask me
what bash would do if a PID cycled all the way around and got reused, even with
32768 on an otherwise busy system it never came up. Yes I needed to research
this to do bash-compatible job control for toysh, yes I blogged about it at the
time, no it's not fully in toysh yet, that was one of the forks I implemented
2/3 of and then got pulled away from long enough I'd need to redo it. But I'm
pretty sure I know _how_ now. :)

>> Which expands to "make sed" uses .singleconfig and redoes generated/* from that,
>> and then if you do a normal toybox build next and it checks if generated/* is
>> newer than ".config" the anwser is "yes" so it doesn't rebuild it, which is
>> wrong. scripts/make.sh doesn't know if KCONFIG_CONFIG changed.
> 
> yes, incorrect is definitely a lot more annoying than slow (and that
> one used to bite me a lot).

The problem is each time I fixed a new one I'd make it further into the
dependency minefield and hit the NEXT corner case. The PROPER dependencies
almost always result in a full rebuild because toys.h -> generated/*.h ->
.config -> Config.in -> generated/Config.{in,probed} -> toys/*/*.c the practical
upshot of which is that <strike>if you stick one in your ear</strike> touching
any command's file SHOULD rebuild everything every time according to make-style
"file is newer than file" logic.

To avoid "changing anything rebuilds everything" you have to understand what the
changes MEAN and break down to smaller granularity WITHIN files (did anything I
_care_ about change) which skips past the timestamp comparisons at which point
rebuilding the headers is almost as fast as figuring out whether you NEED to
rebuild the headers.

I can rule-of-thumb my way past it to get faster builds, but then the build
breaks when the rule of thumb gets broken because $KCONFIG points to a different
 file or I added a flag to a NEWTOY() line or...

In the end I just put it back to "rebuild the headers most of the time" because
debugging something subtle that was unnecessarily build related even ONCE is a
net loss. (Build _break_ because we didn't clean, less of an issue. Potential
"behavior did not update" I waste time on: nope.)

That said, saving $PATH/$CC/$LD/$CROSS_COMPILE so we don't redo the library
probing if none of them had changed was a nice speedup. And turning most of the
compile time header probes into has_include() was also a nice speedup. And there
IS more stuff I can do, it just hasn't been a priority...

Rob