[Aboriginal] Merry christmas, I have found two bugs

Rob Landley rob at landley.net
Thu Dec 27 01:28:43 PST 2012


On 12/26/2012 04:23:06 PM, Bjørn Forsman wrote:
> On 26 December 2012 22:36, Rob Landley <rob at landley.net> wrote:
> > On 12/26/2012 02:55:04 PM, Bjørn Forsman wrote:
> >> Not sure what exactly is segfaulting, but segfault == bug to me.  
> And
> >> it only happens with ./run-emulator.sh, not ./dev-environment.sh.
> >
> > There are somewhat different code paths in the init.sh shell script  
> so it's
> > probably something only running in the first case. I need to  
> rebuild the
> > armv5l target to test this, that'll take a few minutes...

Huh. I did a set -x at the start of init.sh and the relevant chunk is:

   + mountpoint -q dev/pts
   + mount -t devpts dev/pts dev/pts
   + '[' -z '' ']'
   ++ wc -w
   ++ echo /sys/devices/system/cpu/cpu0
   Segmentation fault
   + export CPUS=1
   + CPUS=1

Which corresponds to:

   # If nobody said how many CPUS to use in builds, try to figure it  
out.
   if [ -z "$CPUS" ]
   then
     export CPUS=$(echo /sys/devices/system/cpu/cpu[0-9]* | wc -w)
     [ "$CPUS" -lt 1 ] && CPUS=1
   fi
   export PS1='($HOST:$CPUS) \w \$ '

But... the export command is getting CPUS=1. The next line isn't an  
export (because it assumes the previous line exported the variable even  
if it got the wrong value, which it can't because the echo would still  
output a string even if the wildcard didn't find any files so that's an  
unnecessary test).

So wc -w had to output "1", meaning the echo fed it a string, meaning  
both _worked_. Either they segfaulted on exit (why?) or something's  
funky in the shell...

I tried this on the command line:

unset CPUS; if [ -z "$CPUS" ]; then export CPUS=$(echo  
/sys/devices/system/cpu/cpu[0-9]* | wc -w); [ "$CPUS" -lt 1 ] &&  
CPUS=1; fi

And it didn't segfault.

Ok, replace the export CPUS line in the script with:

   export CPUS=$(printf 0 | wc -w)

Still segfaults.

   export CPUS=$(printf 0 | cat)

Still segfaults... And that's starting to imply that it's the _shell_.  
Is this a bug in ash? Ok, revert the changes and instead make the first  
line #!/bin/bash and... Still segfaults?

Weeeeeeird.

Ok, export CPUS=1 with the rest of the $() commented out... no  
segfault. Try export CPUS=$(echo 1) and it segfaults. What is this,  
somethign in the toybox setup/exit code? It'd have to be the exit code  
because the segfault is _after_ the command otherwise completes. Maybe  
that new setjmp/longjmp stuff? No, that's not the common exit path,  
only the error exit path, toy_exec() ends with:
   toys.which->toy_main();
   if (fflush(NULL) || ferror(stdout)) perror_exit("write");
   exit(toys.exitval);

Ok, what's a command that isn't toybox... ah, expr is still busybox.  
Try $(expr 1) and it does _not_ segfault. Ok, now we're getting  
somewhere. Unless the fact busybox isn't forking an external program is  
part of the reproduction sequence? Ummm...

   echo /sys/devices/system/cpu/cpu[0-9]* | wc -w
   export CPUS=1 #$(echo /sys/devices/system/cpu/cpu[0-9]* | wc -w)
   echo zero
   export CPUS=$(expr 1)
   echo one
   export CPUS=$(hello-dynamic)
   echo two
   export CPUS=$(echo 1)
   echo three

Gives us:

   Freeing init memory: 96K
   1
   zero
   one
   two
   Segmentation fault
   three
   8139cp 0000:00:0c.0 eth0: link up, 100Mbps, full-duplex, lpa 0x05E1

Right, that's toybox. Will it do it on a second copy of toybox in the  
same filesystem? Yes. Ok, I can twiddle that copy without screwing up  
the base system relying on the rest. Now...

   printf("exitval=%d\n", toys.exitval);
   exit(toys.exitval);

Goes:
   exitval=0
   Segmentation fault

Thats kind of unsporting. It looks like something I'm doing is screwing  
up the uClibc exit code. Let's try _exit(toys.exitval); and see if that  
avoids the segfault? Apparently not. So it's not the atexit() call  
chain or the stdio shutdown, it's something that toybox does that  
irritates ash even when we call the exit system call more or less  
directly.

Sigh. I haven't taken a serious look at the ash source in _years_. And  
of course the first file I look at is shell/ash_ptr_hack.c and I  
remember WHY I haven't looked at this since I stopped actually having a  
duty to.

Quick search for an alternative to sticking printf() calls into ash,  
how about running toybox under strace and see if that catches the  
segfault? Hang on, strace didn't run? Oh, duh, updated the wrong copy  
of the file. (Being a bit too clever not re-running build stages to  
speed up test cycles on my sad little netbook: there are 5 copies of  
init.sh in play right now...

And strace says:

   write(1, "exitval=0\n", 10exitval=0
   )             = 10
   exit_group(0)                           = ?
   +++ exited with 0 +++
   Segmentation fault
   two

Which means it's happening after _strace_ exits.  Wheee...

Printf in ash time, but first I hit the grocery store.

Rob


More information about the Aboriginal mailing list