[Aboriginal] Merry christmas, I have found two bugs
rob at landley.net
Thu Dec 27 01:28:43 PST 2012
On 12/26/2012 04:23:06 PM, Bjørn Forsman wrote:
> On 26 December 2012 22:36, Rob Landley <rob at landley.net> wrote:
> > On 12/26/2012 02:55:04 PM, Bjørn Forsman wrote:
> >> Not sure what exactly is segfaulting, but segfault == bug to me.
> >> it only happens with ./run-emulator.sh, not ./dev-environment.sh.
> > There are somewhat different code paths in the init.sh shell script
> so it's
> > probably something only running in the first case. I need to
> rebuild the
> > armv5l target to test this, that'll take a few minutes...
Huh. I did a set -x at the start of init.sh and the relevant chunk is:
+ mountpoint -q dev/pts
+ mount -t devpts dev/pts dev/pts
+ '[' -z '' ']'
++ wc -w
++ echo /sys/devices/system/cpu/cpu0
+ export CPUS=1
Which corresponds to:
# If nobody said how many CPUS to use in builds, try to figure it
if [ -z "$CPUS" ]
export CPUS=$(echo /sys/devices/system/cpu/cpu[0-9]* | wc -w)
[ "$CPUS" -lt 1 ] && CPUS=1
export PS1='($HOST:$CPUS) \w \$ '
But... the export command is getting CPUS=1. The next line isn't an
export (because it assumes the previous line exported the variable even
if it got the wrong value, which it can't because the echo would still
output a string even if the wildcard didn't find any files so that's an
So wc -w had to output "1", meaning the echo fed it a string, meaning
both _worked_. Either they segfaulted on exit (why?) or something's
funky in the shell...
I tried this on the command line:
unset CPUS; if [ -z "$CPUS" ]; then export CPUS=$(echo
/sys/devices/system/cpu/cpu[0-9]* | wc -w); [ "$CPUS" -lt 1 ] &&
And it didn't segfault.
Ok, replace the export CPUS line in the script with:
export CPUS=$(printf 0 | wc -w)
export CPUS=$(printf 0 | cat)
Still segfaults... And that's starting to imply that it's the _shell_.
Is this a bug in ash? Ok, revert the changes and instead make the first
line #!/bin/bash and... Still segfaults?
Ok, export CPUS=1 with the rest of the $() commented out... no
segfault. Try export CPUS=$(echo 1) and it segfaults. What is this,
somethign in the toybox setup/exit code? It'd have to be the exit code
because the segfault is _after_ the command otherwise completes. Maybe
that new setjmp/longjmp stuff? No, that's not the common exit path,
only the error exit path, toy_exec() ends with:
if (fflush(NULL) || ferror(stdout)) perror_exit("write");
Ok, what's a command that isn't toybox... ah, expr is still busybox.
Try $(expr 1) and it does _not_ segfault. Ok, now we're getting
somewhere. Unless the fact busybox isn't forking an external program is
part of the reproduction sequence? Ummm...
echo /sys/devices/system/cpu/cpu[0-9]* | wc -w
export CPUS=1 #$(echo /sys/devices/system/cpu/cpu[0-9]* | wc -w)
export CPUS=$(expr 1)
export CPUS=$(echo 1)
Freeing init memory: 96K
8139cp 0000:00:0c.0 eth0: link up, 100Mbps, full-duplex, lpa 0x05E1
Right, that's toybox. Will it do it on a second copy of toybox in the
same filesystem? Yes. Ok, I can twiddle that copy without screwing up
the base system relying on the rest. Now...
Thats kind of unsporting. It looks like something I'm doing is screwing
up the uClibc exit code. Let's try _exit(toys.exitval); and see if that
avoids the segfault? Apparently not. So it's not the atexit() call
chain or the stdio shutdown, it's something that toybox does that
irritates ash even when we call the exit system call more or less
Sigh. I haven't taken a serious look at the ash source in _years_. And
of course the first file I look at is shell/ash_ptr_hack.c and I
remember WHY I haven't looked at this since I stopped actually having a
Quick search for an alternative to sticking printf() calls into ash,
how about running toybox under strace and see if that catches the
segfault? Hang on, strace didn't run? Oh, duh, updated the wrong copy
of the file. (Being a bit too clever not re-running build stages to
speed up test cycles on my sad little netbook: there are 5 copies of
init.sh in play right now...
And strace says:
write(1, "exitval=0\n", 10exitval=0
) = 10
exit_group(0) = ?
+++ exited with 0 +++
Which means it's happening after _strace_ exits. Wheee...
Printf in ash time, but first I hit the grocery store.
More information about the Aboriginal