[Toybox] [PATCH] Distinguish 32- and 64-bit variants in file(1) for x32.

Rob Landley rob at landley.net
Sun Feb 28 09:44:44 PST 2016


On 02/28/2016 09:48 AM, enh wrote:
> On Sat, Feb 27, 2016 at 10:51 PM, Rob Landley <rob at landley.net> wrote:
>> Newly introduced platforms tend to have EM_MANUFACTURER_ARCH and then
>> later switch it to EM_ARCH. Here's the commit that did that for
>> Microblaze, for example:
>>
>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=69515f8b957a
>>
>> I was going with the arch directories because they don't change as much.
>> I've now refined that with two additional rules:
>>
>> 1) If there were different 32/64 bit directories that were later merged,
>> stick with the old names. This gives us the x86-64 and 386 you wanted,
>> and ppc and ppc64.
> 
> and arm and aarch64? as long as the ARM and Intel names are
> distinguishable, i don't care about S/390 or any of the other junk.

I currently have arm and arm64, but if you deeply care about that I'll
accept a specific patch from you changing just that line after I commit
what I've got (hopefully this afternoon), so it's clear who to blame. :)

(P.S. ia64/ppc64 vs x86-64, to dash or not to dash, another inconstency.
Historical x86_64 having an underscore that could become a dash gave me
an excuse for x86-64 to have a dash...)

>> 2) Every numerical ID should have a unique name, which leads to some...
>> "interesting" cases. ("thing" vs "thing-old" is easier, high 0xbeef
>> style numbers vs the low numbers Linux accepts. But some are like tilegx
>> and tilepro, which one is "tile"? I went with the numerically lower one
>> for that. And sparc has three. And mips _still_ says that the 10 value
>> isn't used, which seems to be the case, I checked a mips64 binary and
>> it's also using 0x08...)
>>
>>> (they're at least fairly logical.) i am aware that they say 386 and
>>> PPC, but if we're aiming for full compatibility with everyone else's
>>> file(1) we don't want to go this route anyway!
>>
>> Is there more than one implementation here, or are we just saying
>> "everybody else uses darwinsys.com/file"?
> 
> as far as i know, there's only one. i've never come across another,
> even on Mac OS.

Wikipedia[citation needed] says OpenBSD has its own, and that the
original was from AT&T in the 70's.

https://en.wikipedia.org/wiki/File_(command)

>>> i don't think that any
>>> _human_ sophisticated enough to be looking at file(1)'s output for an
>>> ELF file is going to be confused by "386" vs "Intel 80386" or "PPC"
>>> for "PowerPC" :-)
>>>
>>> on the other hand i definitely _don't_ think the world needs a _third_
>>> "standard".
>>
>> Posix doesn't standardize this! (Neither does ELF!)
> 
> i meant de facto standard.

Indeed, but "there's only one implementation" is an especially
problematic kind of de facto standard because you can't generalize from
a single data point.

Let's look at Ubuntu 14.04:

  $ file /bin/ls
  /bin/ls: ELF 64-bit LSB  executable, x86-64, version 1 (SYSV),
  dynamically linked (uses shared libs), for GNU/Linux 2.6.24,
  BuildID[sha1]=9d2a434c4ff55aad2ddd19348c0ac75971606483, stripped

1) Since Red Hat 9, an extra space cropped up between LSB/MSB and
"executable". Presumably ignorable?

2) x86-64 isn't saying "AMD", who clearly invented it. There is no
obvious pattern to this.

(Especially fun since "pentium" happened after a judge ruled the number
"586" couldn't be trademarked, so 386 is clearly NOT exclusive to intel.)

3) It's always version 1 (SYSV) so this is useless information. The ELF
spec documents from the 1990's say this field must always be 1 and don't
speculate about other uses, and that's not even linux-specific.

4) The original submission wasn't detecting dynamically linked/uses
shared libs, and when I added it I made it print the dynamic linker
because that's most useful to me. (glibc, uclibc, musl, bionic.) I can
easily get the shared library count too, but getting the shared library
names involves traversing an extra layer of tables, is an unbounded
amount of verbosity, so I decided against it.

While we're at it, somebody want to explain when "dynamically linked,
does not use shared libs" would come up? (Static PIE? How would we
detect that?)

5) I am NEVER saying "GNU/Linux" in toybox output, and we're not
currently detecting this field anyway, and in any case I'm pretty sure
it's glibc-specific. (That said it's potentially useful and I'd happily
say "for Linux 2.6.24", if I can dig up how to detect it and get some
_older_ ones to compare against. Then again, 2.6.24 was released January
24, 2008, meaning "slightly predates posix-2008" and toybox won't build
on a system older than that because we use things like openat() all over
the place. I'd love either a libc version or a "posix-2008" or a "linux
2.0" vs "linux 3.0" vs "linux 4.0" tick here but there isn't anything
like that, I have no idea when this might get updated or by who, I dunno
if musl or bionic exports it (or cares)...)

6) BuildID is new (Red Hat 9 didn't have it) and yet got inserted
_before_ "stripped" so anybody parsing the fields in order as csv would
already be broken. It's sha1, which is now semi-obsolete and according
to http://valerieaurora.org/hash.html anything new should really be
using sha3 these days. That said yay, I'll happily use it but I wanna
know what the upgrade path is and who would be making it (especially
since a stronger hash would probably be longer).

7) Haven't implemented "stripped" yet.

Let's look at another, same file comand different binary:

$ file root-filesystem-armv5l/bin/toybox
root-filesystem-armv5l/bin/toybox: ELF 32-bit LSB  executable, ARM,
EABI4 version 1 (SYSV), statically linked, stripped

1) ARM with no manufacturer. Capitalization continues to be all over the
map.

2) EABI4 isn't even its own CSV, they printed "ARM, EABI4" and anybody
chopping out ",version 1 (SYSV)" would have been disappointed.

3) No BuildID=none placeholder field.

Meanwhile, the thing that _is_ somewhat standardized (by wc3) just says:

$ file --mime-type /bin/ls
/bin/ls: application/x-executable

Which is not enough information. :(

Rob

 1456681484.0


More information about the Toybox mailing list