[Toybox] [musl] Re: Not sure how to debug this one.

Sun Feb 18 12:33:06 PST 2024

On Sun, Feb 18, 2024 at 06:06:46PM +0300, Valery Ushakov wrote:
> On Sun, Feb 18, 2024 at 09:33:13 -0500, Rich Felker wrote:
> 
> > On Sun, Feb 18, 2024 at 03:55:36PM +0300, Valery Ushakov wrote:
> > > On Sat, Feb 17, 2024 at 20:40:50 -0500, Rich Felker wrote:
> > > 
> > > > due to incorrect base address register when attempting to reload the
> > > > saved value of r8, the caller's value of r8 was not preserved.
> > > > ---
> > > >  src/signal/sh/sigsetjmp.s | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/src/signal/sh/sigsetjmp.s b/src/signal/sh/sigsetjmp.s
> > > > index 1e2270be..f0f604e2 100644
> > > > --- a/src/signal/sh/sigsetjmp.s
> > > > +++ b/src/signal/sh/sigsetjmp.s
> > > > @@ -27,7 +27,7 @@ __sigsetjmp:
> > > >  
> > > >  	mov.l 3f, r0
> > > >  4:	braf r0
> > > > -	 mov.l @(4+8,r4), r8
> > > > +	 mov.l @(4+8,r6), r8
> > > >  
> > > >  9:	mov.l 5f, r0
> > > >  6:	braf r0
> > > 
> > > That takes care of restoring caller's r8 for the first return from
> > > sigsetjmp, but isn't there still the problem that the jump buffer
> > > contains the wrong one, so on the second return from sigsetjmp the
> > > caller will have clobbered r8?
> > > 
> > > Sorry for a drive-by reply.  I'll try to take a closer look in the
> > > evening.
> > 
> > No, that's the return path for both returns.
> >
> > The whole reason a call-saved register like r8 is used here is so
> > that we can return twice into the body of sigsetjmp, in order to
> > tailcall __sigsetjmp_tail at both the first return and subsequent
> > return.
> 
> Doh, right!  Sorry.  A comment to that effect to alert the reader
> would certainly have helped :) Neat trick that I missed on the quick
> reading.

Yes. Perhaps a single comment in each asm file pointing to a common
document location (the dummy sigsetjmp.c file would be a good
candidate) would be a good approach. This could also document what
needs to be done when writing a new port.

> > This is what makes it possible to restore the signal mask from the
> > returned-to frame rather than the returning-from frame (which is why
> > the attached doesn't crash with stack overflow on musl like it does
> > on glibc).
> 
> Restoring the context in siglongjmp should not be a problem per-se.
> NetBSD libc does that and the example code doesn't crash there (quick
> unscientific test on a ppc that I happen to have a terminal open on).
> But then NetBSD libc doesn't bother to carefully factor that code to
> minimize the need for MD asm.
> 
> Thanks, and sorry for the noise.

If you restore the signal mask from the returning context rather than
in the returned-to context, there's always the possibility of stack
overflow; in the worst case, this happens on the sigaltstack where
you're specifically taking measures to avoid stack overflow being a
fatal error. The test program is artificial, but the real-world way
this would happen is getting a flood of signals like SIGINT or SIGTSTP
or something coming in faster than you can respond to them, so that
every time you try to return via siglongjmp, you actually consume
another stack frame on the signal stack.

If NetBSD didn't crash, maybe it just has a much larger default stack
size limit? Or maybe they reload sp before calling sigprocmask? That
would work too, but the reason musl doesn't do it that way is that our
setjmp/longjmp are compatible with an old ABI where there is no extra
space in the jmp_buf for sigjmp_buf stuff.

Rich