[Toybox] Confused by bash trap handler return value.

Wed Feb 12 12:36:32 PST 2025

On 2/12/25 09:27, Chet Ramey wrote:
>>> "The value of "$?" after the trap action completes shall be the value it
>>> had before the trap action was executed."
>>
>> Ok, I can do that. Thanks.
>>
>> The question is how to TEST it...
> 
> Testing that trap actions preserve $? is not hard. This had better not
> echo 1.
> 
> trap 'echo WINCH ; false' WINCH
> (exit 41)       # force $? to something neither 0 nor 1
> kill -WINCH $$

At which point $? is now the return code of the kill command, 
overwriting the 41.

> # add this if you're concerned that kill will finish before the SIGWINCH
> # arrives, since it will cause multiple system calls
> (exit 42)

If the trap handler was processed right after kill but before (exit) 
this overwrites the $? from kill _or_ leaked by the signal handler.

> echo $?

So seeing "42" here doesn't prove the signal handler preserved the 0 
from kill.

> The trap won't get run until after the kill returns; trap actions are
> not run asynchronously.

But the kill returns before the (exit 42) runs.

I implemented a check for signals between each command, meaning my code 
checks for pending signals before launching the subshell (and inserts a 
synthetic "eval" on the sh_function call stack for each one). Are you 
saying it should NOT do that?

(Also, I left the signal handler blocked and have the return from the 
"trap" unblock it, so it defers handling a second instance of the same 
signal until the trap handler for the first signal returns. You can have 
different signals interrupt each other, but only uniquely. A screaming 
signal handler could starve the others because the most recently 
returned one is unblocked and can thus get queued up again and interrupt 
back down, while the others can't re-fire until they return meaning we 
have to unwind back down to them, but I dunno a better way and being 
flooded with signals isn't really a use case to optimize for anyway.)

> There's a potential race condition, but I don't
> think it will ever get hit.

I think it will reliably get hit because kill makes a system call to 
send the signal, and the kernel checks for signal delivery on return 
from system calls, and kill() is ITSELF a system call that has an exit 
path checking for pending signals on this process, and signal delivery 
is synchronous within the kernel (the data structure update is not 
delegated to a tasklet, it's done synchronously within the syscall's 
function), meaning if you kill(getpid(), 9) the syscall reliably never 
returns (and should never execute one more instruction of the original 
program) because the signal is dequeued and delivered by the return from 
the kill() syscall itself. Similarly, other signals-to-self should 
return straight to the handler, deterministically. For a builtin kill 
command running within the parent PID, it seems deterministic.

In the case there's a parent and child process, on an SMP system 
cross-CPU signal delivery could introduce some asynchronous jitter, but 
again in this case it should be deterministic because queueing up the 
signal is synchronous (it grabs and releases the appropriate locks to 
directly update the other PID's data structure within the system call, 
unless they COMPLETELY rewrote it into some sort of RCU nonsense since I 
last checked; it's shared memory and the child's process takes a lock 
and updates the parent process's data structures regardless of what CPU 
each is running on). The child's kill() syscall has to finish and return 
before the child can make a second exit() syscall, all while the parent 
process was either stuck in wait(), or stuck in fork(), or scheduled 
away in the code between the fork() and the wait(). Any of those three 
will check for pending signals sent to the parent process before handing 
control back to the parent, and said signal MUST have been queued up 
(and all the related in-kernel locks released) before the child kill 
process can exit.

I.E. SIGCHLD has to get queued up AFTER the SIGWINCH or whatever. So 
either the signal handler gets called or it's waiting in the signalfd.

Where is the race condition? I think you'd need an & in there somewhere 
to race.

> The POSIX language is ambiguous, the difference is between "fail if any
> kill fails" vs. "succeed if any kill succeeds."

Now in issue 8.

>> I suppose I can implement a 0 return code for the kill builtin and a 1 
>> return code if it's called via the $PATH, but... ew?
> 
> I'm going to change it for POSIX mode, since that's compatible with what
> other POSIX shells have implemented. Bash default mode will stay the same.

I plead the 5th on moving targets. (Doesn't help me here, I'd have to do 
a bash version check before adding -p. I think I'll stick with running 
$(which kill).)

At some point I'm going to have to face adding posix mode, but not yet...

Thanks,

Rob