[Toybox] [PATCH] Distinguish 32- and 64-bit variants in file(1) for x32.

enh enh at google.com
Mon Feb 29 08:51:33 PST 2016


On Sun, Feb 28, 2016 at 9:44 AM, Rob Landley <rob at landley.net> wrote:
> On 02/28/2016 09:48 AM, enh wrote:
>> On Sat, Feb 27, 2016 at 10:51 PM, Rob Landley <rob at landley.net> wrote:
>>> Newly introduced platforms tend to have EM_MANUFACTURER_ARCH and then
>>> later switch it to EM_ARCH. Here's the commit that did that for
>>> Microblaze, for example:
>>>
>>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=69515f8b957a
>>>
>>> I was going with the arch directories because they don't change as much.
>>> I've now refined that with two additional rules:
>>>
>>> 1) If there were different 32/64 bit directories that were later merged,
>>> stick with the old names. This gives us the x86-64 and 386 you wanted,
>>> and ppc and ppc64.
>>
>> and arm and aarch64? as long as the ARM and Intel names are
>> distinguishable, i don't care about S/390 or any of the other junk.
>
> I currently have arm and arm64, but if you deeply care about that I'll
> accept a specific patch from you changing just that line after I commit
> what I've got (hopefully this afternoon), so it's clear who to blame. :)

ARM cares about this kind of thing, but i don't. (and the real name
for a 32-bit EM_AARCH64 binary is "Aarch32" anyway, so whatever we do,
we'll always be "wrong" to some extent. likewise "x32".) as long as i
can tell unambiguously what i'm looking at, i don't care.

> (P.S. ia64/ppc64 vs x86-64, to dash or not to dash, another inconstency.
> Historical x86_64 having an underscore that could become a dash gave me
> an excuse for x86-64 to have a dash...)
>
>>> 2) Every numerical ID should have a unique name, which leads to some...
>>> "interesting" cases. ("thing" vs "thing-old" is easier, high 0xbeef
>>> style numbers vs the low numbers Linux accepts. But some are like tilegx
>>> and tilepro, which one is "tile"? I went with the numerically lower one
>>> for that. And sparc has three. And mips _still_ says that the 10 value
>>> isn't used, which seems to be the case, I checked a mips64 binary and
>>> it's also using 0x08...)
>>>
>>>> (they're at least fairly logical.) i am aware that they say 386 and
>>>> PPC, but if we're aiming for full compatibility with everyone else's
>>>> file(1) we don't want to go this route anyway!
>>>
>>> Is there more than one implementation here, or are we just saying
>>> "everybody else uses darwinsys.com/file"?
>>
>> as far as i know, there's only one. i've never come across another,
>> even on Mac OS.
>
> Wikipedia[citation needed] says OpenBSD has its own,

a quick glance at the source suggests that's true. it does look like
they use the same names for the subset they support though. (OpenBSD's
been a zombie for longer than Aarch64 has existed.)

> and that the
> original was from AT&T in the 70's.

oh, i don't doubt that. but that's even more irrelevant than even OpenBSD.

> https://en.wikipedia.org/wiki/File_(command)
>
>>>> i don't think that any
>>>> _human_ sophisticated enough to be looking at file(1)'s output for an
>>>> ELF file is going to be confused by "386" vs "Intel 80386" or "PPC"
>>>> for "PowerPC" :-)
>>>>
>>>> on the other hand i definitely _don't_ think the world needs a _third_
>>>> "standard".
>>>
>>> Posix doesn't standardize this! (Neither does ELF!)
>>
>> i meant de facto standard.
>
> Indeed, but "there's only one implementation" is an especially
> problematic kind of de facto standard because you can't generalize from
> a single data point.
>
> Let's look at Ubuntu 14.04:
>
>   $ file /bin/ls
>   /bin/ls: ELF 64-bit LSB  executable, x86-64, version 1 (SYSV),
>   dynamically linked (uses shared libs), for GNU/Linux 2.6.24,
>   BuildID[sha1]=9d2a434c4ff55aad2ddd19348c0ac75971606483, stripped
>
> 1) Since Red Hat 9, an extra space cropped up between LSB/MSB and
> "executable". Presumably ignorable?

yeah, i did notice that, but thought being bug-compatible was going too far.

> 2) x86-64 isn't saying "AMD", who clearly invented it. There is no
> obvious pattern to this.
>
> (Especially fun since "pentium" happened after a judge ruled the number
> "586" couldn't be trademarked, so 386 is clearly NOT exclusive to intel.)
>
> 3) It's always version 1 (SYSV) so this is useless information. The ELF
> spec documents from the 1990's say this field must always be 1 and don't
> speculate about other uses, and that's not even linux-specific.

yeah, i considered not outputting that but at the time was aiming to
have the same output as the desktop.

> 4) The original submission wasn't detecting dynamically linked/uses
> shared libs, and when I added it I made it print the dynamic linker
> because that's most useful to me. (glibc, uclibc, musl, bionic.) I can
> easily get the shared library count too, but getting the shared library
> names involves traversing an extra layer of tables, is an unbounded
> amount of verbosity, so I decided against it.

that's called readelf(1) :-)

> While we're at it, somebody want to explain when "dynamically linked,
> does not use shared libs" would come up? (Static PIE? How would we
> detect that?)

yeah, i wondered why they bothered with that. even if it's true ---
when is that ever useful? even if it's true and useful, it sounds more
like a job for readelf(1) anyway.

> 5) I am NEVER saying "GNU/Linux" in toybox output, and we're not
> currently detecting this field anyway, and in any case I'm pretty sure
> it's glibc-specific. (That said it's potentially useful and I'd happily
> say "for Linux 2.6.24", if I can dig up how to detect it and get some
> _older_ ones to compare against. Then again, 2.6.24 was released January
> 24, 2008, meaning "slightly predates posix-2008" and toybox won't build
> on a system older than that because we use things like openat() all over
> the place. I'd love either a libc version or a "posix-2008" or a "linux
> 2.0" vs "linux 3.0" vs "linux 4.0" tick here but there isn't anything
> like that, I have no idea when this might get updated or by who, I dunno
> if musl or bionic exports it (or cares)...)

no, dumping notes is definitely a job for readelf(1).

> 6) BuildID is new (Red Hat 9 didn't have it) and yet got inserted
> _before_ "stripped" so anybody parsing the fields in order as csv would
> already be broken. It's sha1, which is now semi-obsolete and according
> to http://valerieaurora.org/hash.html anything new should really be
> using sha3 these days. That said yay, I'll happily use it but I wanna
> know what the upgrade path is and who would be making it (especially
> since a stronger hash would probably be longer).

it's not necessarily sha1. all mine appear to be md5. (and it's not
meant for anything more than a better "do i have mismatched .so
files?" check in debugging tools.)

random example:

out/target/product/flounder/system/lib/libkeystore-engine.so: ELF
32-bit LSB  shared object, ARM, EABI5 version 1 (SYSV), dynamically
linked (uses shared libs),
BuildID[md5/uuid]=99cfd5b2b4c87d6ff76cb05743185633, stripped

> 7) Haven't implemented "stripped" yet.

stripped is far more useful in file than "uses shared libs". i'd
happily lose the latter, but will add "stripped" if you don't.

on ELF files, file(1) is really only useful for "am i looking at the
right .so file?". so endian, class, arch, and stripped are all useful,
as is build id. the rest is basically noise. and if you really need to
know what's going on inside an ELF file, you should be using
readelf(1) anyway.

> Let's look at another, same file comand different binary:
>
> $ file root-filesystem-armv5l/bin/toybox
> root-filesystem-armv5l/bin/toybox: ELF 32-bit LSB  executable, ARM,
> EABI4 version 1 (SYSV), statically linked, stripped
>
> 1) ARM with no manufacturer. Capitalization continues to be all over the
> map.
>
> 2) EABI4 isn't even its own CSV, they printed "ARM, EABI4" and anybody
> chopping out ",version 1 (SYSV)" would have been disappointed.
>
> 3) No BuildID=none placeholder field.
>
> Meanwhile, the thing that _is_ somewhat standardized (by wc3) just says:
>
> $ file --mime-type /bin/ls
> /bin/ls: application/x-executable
>
> Which is not enough information. :(
>
> Rob



-- 
Elliott Hughes - http://who/enh - http://jessies.org/~enh/
Android native code/tools questions? Mail me/drop by/add me as a reviewer.

 1456764693.0


More information about the Toybox mailing list