[Toybox] Implementing ldd.

Rob Landley rob at landley.net
Sun Jan 30 01:16:54 PST 2022


On 1/28/22 8:09 PM, enh wrote:
>> Sigh. You did ask nicely. And I'm not trying to make more work for you...
>>
>> (Personally, I don't write code on the assumption everybody else is dumber than
>> me and that I know what people I've never met want/need better than they do.)
> 
> whereas i have whole weeks where i'm basically just very expensive
> tech support for random folks off the internet :-)

Oh me too, but the result is generally me writing documentation, clarifying help
text, new code for an easier tool... :)

Heck, I'm still fielding emails about busybox, most recently on the 11th:

> I have seen you are a Busybox developer and I hope you can help me.
> I am using µClinux on a SOM and I am trying to add a ptpd applet on Busybox.

And 5 emails later the thread ended on the 19th. (Yes I _did_ hand that project
over to a new maintainer 15 years ago, and yet...)

The most recent time I got cc'd on a random linux-kernel thread I had nothing to
do with (probably as fallout from Documentation directory maintainership I
handed back to Randy Dunlap in 2014, which has since been handed off to Jonathan
Corbet) was [checks] thursday.

And of course:

  https://landley.net/notes-2022.html#22-01-2022

And yes that's a real thing:


https://threatpost.com/mirai-variant-cross-compiles-attack-code-with-aboriginal-linux/136906/

*shrug* My baseline for this is, once again, abnormal. Way back when I was one
of 4 writers for a stock market investment column I'm told was read by ~15
million people (resulting in things like
https://firstmonday.org/ojs/index.php/fm/article/view/839/748 ) so I got used to
my inbox exploding fairly early on.

>> Alas Bernhard only lasted a year or two before burning out himself. Back during
>> the "squashed by buildroot" phase, three different NPTL implementations had been
>> submitted (for three different architectures), and the devs kept trying to unify
>> instead of just PICKING one and throwing the rest away. This led to a two year
>> gap with no releases (see https://uclibc.org/oldnews.html between 0.9.28 and
>> 0.9.28.1) where code kept piling up in the dev branch diverging farther and
>> farther from the release branch. I had my hands full with busybox so couldn't
>> take it on, and then the way I left busybox didn't leave me inclined to go back
>> for seconds. (Plus, Bernhard was nominally doing the thing, except for the part
>> where he wasn't. I note that when I burned out on busybox because of Bruce's
>> GPLv3 rants, Erik advised me to take a few months off instead of handing off the
>> project to somebody who still had enthusiasm for it. I didn't listen.)
>>
>> The decline of uClibc
> 
> (i actually asked the internet after hitting send, and was amused by
> the wikipedia page's "final release: 2012", specifically how far past
> your usual 7 year rule that was :-) )

Oh sure. It's long dead and I argued as such:

  https://lists.uclibc.org/pipermail/uclibc/2017-March/049250.html

But I consider it important history both because of how it shaped the Linux libc
landscape (it's the reason glibc wasn't "the libc" the way gcc was "the
compiler" requiring a decade of work to unstick when they went nuts), and
because I was personally involved with it (if not a major code contributor) for
over a decade. :)

>> > i think
>> > as long as this is called something that makes it clear what it
>> > *actually* is, it's not completely useless. though the few people who
>> > genuinely *only* need "recursive DT_NEEDEDs" and know that are also
>> > fully capable of writing the script themselves if/when they need it, i
>> > think. as always, i'm not too worried about them. i'm worried about
>> > the folks who have no idea what they're doing, and how we make sure
>> > they don't get even more misled.)
>>
>> "Unix gives you enough rope to shoot yourself in the foot." - Allen Holub
>>
>> What people who have no idea what they're doing need is a tutorial.
> 
> tutorials don't scale, and bitrot really fast.

Documentation does require maintenance, yes.

> (i literally spent this
> afternoon trying *and failing* to follow our "how to use asan for an
> app" docs.) what people really need is automation, and the great thing
> about automation is that it's good for the experts as well as the
> don't-even-known-they-don't-know.

I take things apart to understand how they work. Automation bit-rots too, and
when your bus number falls to ZERO trouble is only a matter of time.

Automation also tends to increase complexity. Reducing complexity is like
reducing clutter, it's constant janitorial work. Abstraction layers that hide
complexity give it somewhere to accumulate.

We agree on the problem but not the solution. I'm aware the author of
http://steve-yegge.blogspot.com/2007/12/codes-worst-enemy.html works for Google
 (last I heard) and has distanced himself from it a bit, but I still think he
was right. (Although that article needs to be edited down to less than half its
length, which is outright IRONIC given the subject matter.)

>> >> Not really seeing that as an improvement.
>> >
>> > that's because you disagree with the three main (only?)
>>
>> surviving
>>
>> > ldds about what ldd does :-)
>>
>> Yes, so did the man page, that's why I quoted it. The BSD ldd predates the linux
>> ones and is a bit more explicit:
>>
>> https://www.freebsd.org/cgi/man.cgi?ldd
>>
>> And the a.out ldd definitely worked like I'm thinking, before Linux switched to
>> ELF in 1995. :)
> 
> (TIL a.out supported shared libraries!

Sadly, still does! (Binflt is the nommu version of a.out. The ELF nommu variant
is fdpic, which ARM _finally_ started supporting in 2017 with kernel commit
50b2b2e691cd. Which means yes, most cortex-m Linux was using binflt which meant
a.out before then. Sigh.)

The a.out format was more or less the original unix linking format from Bell
Labs, and from there picked up by BSD, so it was ~25 years old when Linux
switched off of it.

The problem with a.out (on Linux) is all the addresses were assigned at compile
time, which wasn't a problem for processes (each of which still had its own
virtual address range), but every shared library in the system had to be given a
globally unique address range to load at, which you could do when building a
distro, but if you installed third party libraries you couldn't avoid conflict.

The BSD guys handled this by extending their a.out to have relocation support,
but System V introduced a whole new executable format, and AT&T wen taround
browbeating all the unix companies to switch from BSD to System V during the
1980s (hence SunOS->Solaris and AOS->AIX and so on), which meant switching to
ELF. Minix used a.out so Linus copied that initially, but his system call list
came from the Solaris manuals in his university library so...

https://gondwanaland.com/meta/history/interview.html

By the way, fdpic is nice for security-minded with-mmu systems too because it's
just ELF with the 4 major segments (data, code, bss, rodata) independently
relocatable rather than contiguous. The downside is this eats 3 more base
registers (or forces access through a base register array), but the upside is
you can ASLR harder:

https://j-core.org/downloads/fdpic-sh.txt

https://static.linaro.org/connect/sfo15/Presentations/09-24-Thursday/SFO15-406-%20ARM%20FDPIC%20Toolchains.pdf

https://github.com/mickael-guene/fdpic_manifest

> that must have been while i was
> busy statically linking everything/only writing java anyway...)

Linux predates the public release of Java by about 4 years (and had a 1.0
release about a year earlier).

Here's the Linux Journal articles introducing the Linux community to the ELF
file format in 1995:

https://www.linuxjournal.com/article/1059
https://www.linuxjournal.com/article/1060

And the first major ELF distro release (Red Hat 3) came out May 1, 1996. All the
Debian 0.x releases are a.out but 1.1 was ELF.

>> *shrug* I still think the older versions had it right, or at least were doing
>> something useful.
>>
>> >> I may not be a good baseline for "they'll never need that". I've stuck printfs
>> >> into more than one board's stage 1 bootloader (spinning on "output byte to
>> >> register" and "wait for bit to clear so I can send next byte"). I stuck printfs
>> >> into the uClibc dynamic loader to debug its relocation of _itself_. I've worked
>> >> out the constant to subtract from "string" to compensate for the "copy from
>> >> flash to sram" it hasn't done yet but the symbol tables think it has.
>> >
>> > (see my earlier comment about the value to people who know what
>> > they're doing < the added confusion for those who don't.)
>>
>> I didn't expect you to enable it in the android build any time soon. :)
> 
> yeah, in that sense i don't care: i won't build for android or for the
> host. but that doesn't mean i won't get asked for it by people who
> have no idea what they're even saying (or at least can't word it in a
> way i can interpret). http://b/215403271 "add busybox for android".

I could add a toybox FAQ entry about that you could point people at? :)

>> >> You seem to be thinking of something that almost needs to solve the halting
>> >> problem to figure out what new way du jour android package authors manage to
>> >> screw up their builds.
>> >
>> > no, i'm saying "80% of people who ask me for 'an ldd i can run on the
>> > host' are *actually* looking for that verification tool instead".
>>
>> I think "people asking you" is at _least_ as skewed from the norm as my "I may
>> not be a good baseline for" above. :)
>>
>> > just you talking about it is giving me nightmares[1]. i already have a
>> > hard enough time trying to explain to these people that (a) that's
>>
>> I didn't know "this is the list of libraries this binary is trying to load" was
>> a nightmarish question for you.
> 
> see the bug i just linked to for an example of the quality of bug
> report i deal with :-)
I get those people emailing me directly. (Remember
https://git.busybox.net/busybox/commit/?id=eb84a42fdd1d remained in the busybox
source until Bradley Kuhn removed it in commit 0e941d542736 in 2012, shortly
after https://lwn.net/Articles/478308/ .)

That particular guy does not appear to have english as a first language, so I
dunno how much of the "impermeable" is lack of reading comprehension. I'd try to
reply to him on the issuetracker.google.com but my google account is... Sigh, I
should email you about that. (I break everything.)

>> > basically a shell one-liner but (b) it wouldn't even find the specific
>> > problem with their apk that made them claim to need it, let alone any
>> > of the other common ones. (the worst part of all this being that i
>>
>> Obviously a host tool is not going to notice selinux rule conflicts, but I have
>> a personal history with "or we'd have written it already" being annoyingly wrong.
> 
> my point is more "it _has_ been written already, and came with your libc".

I was replying to you saying:

> the worst part of all this being that i
> genuinely don't think there's a decent *host* implementation of this,
> or we'd have written it already.

Which kind of implies you hadn't?

> i think your point is "i want something for the host when
> cross-compiling for embedded systems (and am happy to ignore even
> stuff like rpath or $LD_LIBRARY_PATH)", to which my only response is
> "i don't get why you don't write the short script like everyone else,
> but if you prefer C, that's fine, but please don't increase confusion
> by calling it ldd".

Back when I started using ldd it hadn't grown the bells and whistles of modern
ones, the man page still says it just shows the shared libraries required by the
program and that invoking the dynamic linker is only "the usual case", and I
used to use a version that did what I was suggesting this one should do.

But you clearly feel more strongly about it than I do...

>> Running on current android doesn't prove it runs on older versions, therefore
>> the host should not have ldd?
> 
> more like "for my users, the recursive DT_NEEDEDs tells them *nothing*
> ... but they think it does, and i have enough headaches already" :-)

"but they think it does" implies (to me) that you need a proper writeup you can
point people at about what properly checking vesion compatibility means/entails.
(What's the old saying, "Give a man a fire he's warm for a day, set him on fire
and he's warm for the rest of his life"...)

Sadly, when I notice missing writeups I'm usually the one who has the WRITE the
documentation I want to read and/or point somebody at. Oh well, I've got a FAQ
for a reason...

>> > 1. although one possibility is that they stop bothering us and start
>> > bothering you instead, sadly i doubt that and suspect that i spend my
>> > time explaining why we won't be adding "toybox ldd" to the NDK...
>>
>> The current ldd says "not a dynamic executable" when confronted with something
>> like system/chre/build/app_support/google_slpi/libchre_slpi_skel.so but COULD
>> instead say "arch mismatch (use ldd --force)", or "/lib/ld-musl-powerpc.so.1 not
>> found (use --force)" and then output more info in --force mode. If both
>> objections are to stderr, it's technically compliant. :)
> 
> bionic's linker does say "is for EM_??? (20) instead of EM_AARCH64" :-)
> 
> (i couldn't bring myself to include and maintain a set of strings for
> the architectures we *don't* support!)

https://github.com/landley/toybox/blob/master/lib/lib.c#L1451

Rob


More information about the Toybox mailing list