[Toybox] [musl] Not sure how to debug this one.

Sat Feb 17 07:01:15 PST 2024

Rob Landley dixit:

>One of the private email replies that didn't go to the list (so I can't politely
>publicly reply to it and maybe get more people who know stuff chiming in)

Sorry, I should have mentioned I only replied privately because
I threw a ton of vague ideas which I wasn’t even sure were at all
relevant at you and wanted to avoid spamming the list. Feel free
to quote.

>On 2/16/24 21:23, Mouse wrote:

>> This smells to me like depending on uninitialized stack trash.
>
>A write-only function that didn't change its behavior when I memset the
>structure before calling it?

More shoots into the blue:

• run under valgrind
• see if the compiler didn’t optimise the memset away or use
  explicit_bzero instead

>> The simple way to figure that out is to compile something that uses
>> printf and look at the assembly,
>
>$ ccc/sh4-linux-musl-cross/bin/sh4-linux-musl-objdump -d
>generated/unstripped/toybox | grep -A 60 '<printf>:'
>0045341c <printf>:
[…]

I think Mouse meant the call site, not the callee.

>But again, the point is to SAVE those registers, in a defined order, and there's
>no WAY to insert something that big into delicate assembly non-intrusively. This

Yeah, tricky if one does not know the architecture…

>relocated itself yet so I assembled a message into a char buffer[] on the stack
>and did a syscall(_nr_write).

That would be another idea for here, but again, complexity…

>Oh, I forgot to mention that qemu-system-blah also has a -s option to launch a

This is useful, but often not so much to debug userspace.
I tend to try and figure out where userspace is running at,
then set a breakpoint and let the kernel run until there,
but you’d most likely also get interrupts and stuff, hence
the qemu-user suggestion.

>I do not always have the relevant domain expertise, which is why I try to ask
>people who _do_:
>
>https://lkml.org/lkml/2011/12/14/324

(Yes, the reloc should totally be split; it’s also possible
to have only a HI load, for example if it’s known the lower
ten bits to be zero. I’ve seen that on BSD/sparc.)

>suggested trying it under qemu-user (which reproduced the issue! MUCH easier),

Oh, good!

>And that ALSO says it's a trap 0x180 which in qemu:
>
>sh7750_regs.h:#define SH7750_EVT_ILLEGAL_INSTR       0x180 /* General
>Illegal Instruction */

O̲U̲C̲H̲.̲

>The kernel panics immediately upon PID 1 exiting and even if the panic is
>deferred until after it's written the core dump instead of a check at the START
>of exiting, the writeable filesystem is initramfs which is transient.

Can you do something like put the stuff from the initramfs instead
onto a normal filesystem on loopback of a file on your host, then
provide that as nbd, then boot the kernel with root=/dev/nbd0… or
something, nbd needs a tool to set this up first… or NFS, is there
a kernel-side NFS that doesn’t rely on initramfs setup?

Or, hey, for on-qemu-system debugging, just plug that as -hda or so.

>A) I believe you can still pass rw on the kernel command line

probably even rdsetroot…

>B) you can run a
>dumb little statically linked shim.c as rdinit= to do stuff and then have it
>exec() the next PID 1 process, that's fairly standard procedure in this context.

Indeed, that would be the next avenue.

>But I only pull out gdb when I'm REALLY annoyed. (Cure worse than the disease.
>Can't STAND the user interface...)

Having used Borland Turbo Debugger before, I concur, but I’ve had to
pull out gdb often enough to at least find my way around enough for
these kinds of debugging necessary.

Good luck,
//mirabilos
-- 
<ch> you introduced a merge commit        │<mika> % g rebase -i HEAD^^
<mika> sorry, no idea and rebasing just fscked │<mika> Segmentation
<ch> should have cloned into a clean repo      │  fault (core dumped)
<ch> if I rebase that now, it's really ugh     │<mika:#grml> wuahhhhhh