[Toybox] [toybox] 0.7.0: scripts/make.sh: line 270: wait: pid XYZ is not a child of this shell (#24)

Rob Landley rob at landley.net
Fri Feb 26 10:12:57 PST 2016


On 02/26/2016 12:31 AM, Nicolas Boichat wrote:
> On Fri, Feb 26, 2016 at 1:53 PM, Rob Landley <notifications at github.com
>     On 02/25/2016 01:31 AM, drinkcat wrote:
>     > We use toybox-0.7.0 as part of the Chromium OS project,

P.S. Yay!

>     > and sometimes
>     > hit an issue when building it on our automated builders (see this
>     issue
>     > <https://bugs.chromium.org/p/chromium/issues/detail?id=584542>):
>     >
>     > |toybox-0.7.0: armv7a-cros-linux-gnueabi-gcc -O2 -O2 -pipe
>     -march=armv7-a
>     > -mtune=cortex-a15 -mfpu=neon -mfloat-abi=hard -g -fno-exceptions
>     > -fno-unwind-tables -fno-asynchronous-unwind-tables -clang-syntax
>     > -funsigned-char -Wno-string-plus-int -I . -Os -ffunction-sections
>     > -fdata-sections -fno-asynchronous-unwind-tables
>     -fno-strict-aliasing -c
>     > toys/posix/tail.c -o generated/obj/tail.o toybox-0.7.0:
>     scripts/make.sh:
>     > line 270: wait: pid 8477 is not a child of this shell toybox-0.7.0:
> 
>     Hmmm... PID wrap, maybe?
> 
> That's what we were wondering about... The builder is building a lot of
> other packages at the same time, including Chromium, so it's not
> unlikely that the PID space is saturated... Also, the builder retries
> after the first failure, and the second try always works (probably when
> the builder is less busy...)

Possibly the OS is killing zombies if it wants to reuse that PID before
the zombie is reaped? (Which would be a horrible heuristic because
process exit could happen after a long runtime but right before a new fork.)

Or maybe it's doing so if it there _are_ no more free PIDs, instead of
fork failing?

In either case, moving to $! wouldn't fix it. But that also wouldn't
explain why only bash was seeing the problem...

It's an interesting bug and I'd be interested in tracking it down if I
was willing to get sucked into debugging GPLv3 bash. (GPLv2 bash I spent
days tracking down weirdness, ala:

The initial problem:
  http://landley.net/notes-2011.html#24-08-2011

Mentioned in passing:
  http://landley.net/notes-2011.html#26-08-2011
  http://landley.net/notes-2011.html#28-08-2011

Deep dig:
  http://landley.net/notes-2011.html#02-09-2011
  http://landley.net/notes-2011.html#03-09-2011
  http://landley.net/notes-2011.html#04-09-2011

And finally finding it:
  http://landley.net/notes-2011.html#05-09-2011

Yes, that's me happily digging through libc, kernel, and back into a
userspace program to find a problem. But if a GPLv3 program is involved,
"it's broken, let's replace it".

>     > Looking at the code (|script/make.sh|), we are wondering about
>     your use
>     > of |$(jobs -rp)|. Wouldn't it be more correct to add jobs to PENDING
>     > using |$!| right after you launch the job (|do_loudly|)?
> 
>     If you think that'll help, I'm happy to give it a try, sure.
> 
> 
> I have a commit ready here, that appears to fix the problem:
> https://github.com/drinkcat/toybox/commit/4c705620d73e3e9c12a3be54dc5d2efda939241a

I pushed a change last night based on your $! suggestion, did that fix
it? (Your patch is using ${%%} to filter, which is interesting. I
couldn't make ${//} work right but maybe that could replace my sed
invocation? Trying to get the number of execs in the dispatch/monitoring
cycle down as small as possible. Then again once it can build under a
toybox shell then it's just a fork() and not an exec, which is cheaper.
Eh, worry about it later...)

> It's a little less aggressive at parallelizing, as it always waits for
> the first PID if PENDING is full (instead of refreshing the PENDING list
> every time)...

So's the one I did last night. I should poke around on my 8-way machine
and see how it's doing keeping the cpus busy...

> I guess that you prefer I send the patch to the list? Or is a github PR
> fine too?

What would be _really_ nice is if github gave me a button to get the
"git format-patch" version of the patch at the above URL. But of course
they don't do that, why would they do that?

When github emails me a pull requests I can wget and "git am" from
there, so it's usable. (It's then up to the submitter to _close_ said
request, but having a list of old irrelevant pull requests I've already
dealt with one way or another is github's problem, as far as I'm concerned.)

Posting them to the list gives other people the chance to chime in, but
I think we covered that here. :)

Thanks,

Rob

 1456510377.0


More information about the Toybox mailing list