[Toybox] [PATCH] Distinguish 32- and 64-bit variants in file(1) for x32.

Rob Landley rob at landley.net
Mon Feb 29 13:01:14 PST 2016


On 02/29/2016 10:51 AM, enh wrote:
> On Sun, Feb 28, 2016 at 9:44 AM, Rob Landley <rob at landley.net> wrote:
>>> as far as i know, there's only one. i've never come across another,
>>> even on Mac OS.
>>
>> Wikipedia[citation needed] says OpenBSD has its own,
> 
> a quick glance at the source suggests that's true. it does look like
> they use the same names for the subset they support though. (OpenBSD's
> been a zombie for longer than Aarch64 has existed.)

Indeed, although people trusted it to run OpenSSL and/or OpenSSH for the
longest time. (I've never been clear on the relationship between those
projects, other than "not dropbear".)

>> 2) x86-64 isn't saying "AMD", who clearly invented it. There is no
>> obvious pattern to this.
>>
>> (Especially fun since "pentium" happened after a judge ruled the number
>> "586" couldn't be trademarked, so 386 is clearly NOT exclusive to intel.)
>>
>> 3) It's always version 1 (SYSV) so this is useless information. The ELF
>> spec documents from the 1990's say this field must always be 1 and don't
>> speculate about other uses, and that's not even linux-specific.
> 
> yeah, i considered not outputting that but at the time was aiming to
> have the same output as the desktop.

Indeed, but I was looking at something that already didn't match because
it only produced half the desktop output, and yet it was considered useful.

The annoying thing is I'm not sure what standards body I should grab and
shake to try to _get_ a standard for this.

The loudest voice on the Austin Group (posix) mailing list is Jorg
"Linux Sux Solaris Forever" Schilling (the guy who maintained cdrecord
in a way that made Debian fork it, and kept an outright _shrine_ to
decade-old Linux kernel bugs in the cdrecord readme as reasons you
should be using solaris instead).

The group behind the Linux Standard Base disappeared into the Linux
Foundation's accretion disk years ago, and again Debian washed its hands
of them https://lwn.net/Articles/658809/ (not that pissing off Debian is
a high bar, but as with canaries in coal mines, they're useful as a
diagnostic because they're so fragile and easily spooked). My opinion of
the Linux Foundation itself is a matter of public record:

  http://landley.net/notes-2010.html#18-07-2010

Recently... "refreshed", shall we say?

  http://lwn.net/Articles/673473/

The ELF 4.1 ABI spec is copyright 1996 SCO, which did the "dying
business models explode into a cloud of IP litigation" thing a decade
and change ago.

C99 is from ISO, and
https://en.wikipedia.org/wiki/Standardization_of_Office_Open_XML pretty
much rules them out as a functional standards organization rather than a
fully owned subsidiary of Microsoft...

I'm pretty happy with SPDX (toybox license got approved,
https://spdx.org/licenses/0BSD.html), but their mandate is pretty
narrow. In theory OSI has a slightly less narrow mandate, but after
https://lists.spdx.org/pipermail/spdx-legal/2015-December/001574.html
(warning: long thread, many replies, much shouting) I don't want to get
any of OSI on me ever again.

Sigh. The linux foundation is currently collecting _some_ vendor
supplements, but by no means all. Where's superh? (Answer: neither
Hitachi nor Renesas gave then Linux Foundation money, therefore they
don't exist to them.) Posix is the next logical place, and at least has
the virtue of being active...

I dunno, maybe a textfile version of this belongs in the Linux
Documentation directory...?

>> 4) The original submission wasn't detecting dynamically linked/uses
>> shared libs, and when I added it I made it print the dynamic linker
>> because that's most useful to me. (glibc, uclibc, musl, bionic.) I can
>> easily get the shared library count too, but getting the shared library
>> names involves traversing an extra layer of tables, is an unbounded
>> amount of verbosity, so I decided against it.
> 
> that's called readelf(1) :-)

Actually it's called ldd. :)

That said, "is this a musl binary, a uClibc binary, a bionic binary, or
a glibc binary" is a question I tend to want answered because I have
them lying around on my systems. (There's a chunk of venn diagram where
Android and Embedded Linux overlap, but it's not the majority of either
one.)

Also, when I get around to the relevant shell extensions I want "bad
/lib/ld-uClinux.so.0" as the error message for launching an elf binary
when it can't find its dynamic linker. (The kernel returns an -ENOTFOUND
and the shell used to to just say command-you-tried-to-run not found
when it was right there and it was confusing... So I may need to factor
out the "find dynamic linker of an elf binary" function out into
lib/lib.c...)

>> While we're at it, somebody want to explain when "dynamically linked,
>> does not use shared libs" would come up? (Static PIE? How would we
>> detect that?)
> 
> yeah, i wondered why they bothered with that.

I note that Rich Felker is doing Static PIE stuff for nommu systems, and
being able to distinguish fdpic from static pie from conventional ELF
would be really useful for me. (Hint hint. Or I could read through the
kernel's fs/binfmt_elf_fdpic.c. It's on my todo list...)


>> 6) BuildID is new (Red Hat 9 didn't have it) and yet got inserted
>> _before_ "stripped" so anybody parsing the fields in order as csv would
>> already be broken. It's sha1, which is now semi-obsolete and according
>> to http://valerieaurora.org/hash.html anything new should really be
>> using sha3 these days. That said yay, I'll happily use it but I wanna
>> know what the upgrade path is and who would be making it (especially
>> since a stronger hash would probably be longer).
> 
> it's not necessarily sha1. all mine appear to be md5. (and it's not
> meant for anything more than a better "do i have mismatched .so
> files?" check in debugging tools.)

I broke down and looked at the other file implementation's source (it's
bsd licensed and strace isn't immediately helpful here) and it's
traversing elf tables rather than looking at the header data.

I've got an mmap for the dynamic linker, but should probably just mmap
the whole file. (32 bit address space exhaustion is a process-local
problem, if your executable is >2 gigabytes the mmap fails and we handle
it gracefully...)

> random example:
> 
> out/target/product/flounder/system/lib/libkeystore-engine.so: ELF
> 32-bit LSB  shared object, ARM, EABI5 version 1 (SYSV), dynamically
> linked (uses shared libs),
> BuildID[md5/uuid]=99cfd5b2b4c87d6ff76cb05743185633, stripped
> 
>> 7) Haven't implemented "stripped" yet.
> 
> stripped is far more useful in file than "uses shared libs". i'd
> happily lose the latter, but will add "stripped" if you don't.

I'm working on it, but it looks like I need to traverse the section
header table in order to figure this out. (And no I can't just notice
zero entires or not both because stripping won't necessarily remove all
of them and because zero could mean the funky overflow behavior
described in man 5 elf. Sigh.

Seriously, _why_ does the elf header info keep recording sizes that have
to be constants?

> on ELF files, file(1) is really only useful for "am i looking at the
> right .so file?". so endian, class, arch, and stripped are all useful,
> as is build id. the rest is basically noise. and if you really need to
> know what's going on inside an ELF file, you should be using
> readelf(1) anyway.

There used to be a tiny readelf in the uClibc source, but they removed
it long ago. (I salvaged it and was building it in aboriginal linux, but
it's presumably gpl so I'd have to start over again if I went there.)

But that's more a qcc thing than a toybox thing, which means look at it
again after the toybox 1.0 release...

Rob

 1456779674.0


More information about the Toybox mailing list