[Toybox] [PATCH] Distinguish 32- and 64-bit variants in file(1) for x32.

Rob Landley rob at landley.net
Sat Feb 20 12:28:22 PST 2016


On Wed, Feb 17, 2016 at 7:02 PM, enh <enh at google.com> wrote:
> On Wed, Feb 17, 2016 at 3:32 PM, Rob Landley <rob at landley.net> wrote:
>> On Wed, Feb 17, 2016 at 10:22 AM, enh <enh at google.com> wrote:
>>> It's necessary to distinguish x86 and x86-64 to be able to recognize the
>>> way x32 is encoded in ELF.
>>
>> Hmmm. That's not fun.
>>
>> I note that I spent the morning teaching the code to read/display the
>> dynamic linker name, so this patch won't "git am" directly.
>>
>> Reading the patch, we're pretending that arrch64 has nothing to do
>> with arm? No mention of arm in this architecture? Ok... (I guess
>> Cortex-M isn't arm either, but don't currently have an example binary
>> of that to test.)
>
> well, you're the one who removed my original "ARM aarch64" which is
> what the regular desktop file(1) says :-P

But not what the linux-kernel developers ever seem to say in their
patch submissions.

http://lkml.iu.edu/hypermail/linux/kernel/1602.2/00068.html
http://lkml.iu.edu/hypermail/linux/kernel/1602.2/00070.html
http://lkml.iu.edu/hypermail/linux/kernel/1602.2/01227.html
http://lkml.iu.edu/hypermail/linux/kernel/1602.2/00079.html
http://lkml.iu.edu/hypermail/linux/kernel/1602.2/00138.html
http://lkml.iu.edu/hypermail/linux/kernel/1602.2/00156.html

And so on and so forth...

> i assumed you were going for short names. (and aarch64 seems to be
> ARM's preferred name. it's the kernel that thinks it should be called
> arm64.)

The name "aarch64" is designed to be unpronounceable. Why they cared
so much that it come before "aardvark" in alphabetical listings, I
couldn't tell you. (Manufacturers are often insistent that normal
people can somehow be expected to say "Kellog's Corn Flakes" in
regular conversation. I personally try not to encourage this, but
users come first...)

>>> (Plus it's customary to distinguish the variants,
>>> so people are probably eyeballing the architecture rather than paying
>>> attention to the ELF class.)
>>
>> I moved "32-bit x86" and '64-bit x86" right next to each other. (It
>> was duplicative, saying 32-bit or 64-bit and then saying x86-64 or
>> aarrcchh64.)
>>
>> I pondered changing it to add a -64 extension for 64 bit platforms
>> instead of saying "xx-bit", but that would give us alpha-64 and
>> s390-64.
>
> i'd prefer to just use the names commonly seen in the wild. no one
> says "64-bit x86".

You have a point. Before arch/x86 got combined there were "386" and
"x86-64" directories. But today there's arch/arm and arch/arm64...

>>> If we're not going to use the same strings as traditional file(1) -- which
>>> is what I'd tried to do
>>
>> I can move it back to more traditional "file" output, but we're not
>> matching exactly (missing a number of fields) and there's no spec. And
>> that was _before_ I added dynamic linker output so you can distinguish
>> glibc/bionic/musl/uClibc binaries...
>>
>> I really don't know what users expect from this. The --mime-type
>> output seemed far more easily machine readable (which we haven't
>> started on yet).
>
> me neither, but i do think we should try to make the fields
> stand-alone. i don't trust myself to look at the ELF class in addition
> to the arch, let alone anyone else :-)

That's why I put them right next to each other, but if Intel wants it
to be called ia32e or em64t, that's the clearly expressed
manufacturers' naming wishes which clearly everybody obeyed...

>>> -- we should probably use the strings from the ELF
>>> spec rather than the Linux kernel,
>>
>> Maybe it's changed since 2010, but when I was digging into this then
>> (for hexagon), the last time the ELF spec had been updated was
>> something like 1995, and the documents were stale snapshots hosted on
>> a sco.com website.
>>
>> I believe that the Linux Foundation has since taken over the hosting,
>> but I'm unaware of them actually acting as a standards body for this.
>> Nor were there actually "standards" even for some old platforms like
>> Alpha.
>>
>> By "standard" do you mean the values in the glibc elf.h file? (Trust
>> glibc over the kernel? I was using the kernel to avoid coming up with
>> my own policy decision out of the blue.)
>
> no, i mean the n separate docs that the arch owners create. so aarch64
> would be http://infocenter.arm.com/help/topic/com.arm.doc.ihi0056b/IHI0056B_aaelf64.pdf
> for example. but i only have to worry about six, but it seems like you
> want to have all the architectures that Linux supports?

I do. The eventual goal of my aboriginal linux project is to run under
qemu on all the available architectures. Currently that's the an AND
mask of what musl, qemu, and linux support, but I'm making puppy eyes
at musl to be less blocking. :)

(Making puppy eyes at uClibc never worked in the history of uClibc.)

> but actually, the names of the constants (minus EM_ and s/_/-/) from
> uapi/linux/elf-em.h look fine to me. and gets us out of the business
> of inventing *yet another* set of names.

I'm all for it, except that file doesn't mention arm...?  Ah, I've got
2.6.36 checked out. Let's see...

$ grep -r '#define[ \t]*ELF_ARCH' . | awk '{print $3}' | sort -u | xargs
ELF_ARCH EM_386 EM_68K EM_AARCH64 EM_ALPHA EM_ALTERA_NIOS2
EM_ARCOMPACT EM_ARM EM_AVR32 EM_BLACKFIN EM_CRIS EM_FRV EM_H8_300
EM_HEXAGON EM_IA_64 EM_M32R EM_METAG EM_MICROBLAZE EM_MIPS EM_MN10300
EM_OR32 EM_PARISC EM_S390 EM_SCORE7 EM_SH EM_SPARC EM_SPARCV9
EM_TI_C6000 EM_UNICORE EM_X86_64 EM_XTENSA

ELF_ARCH is #defined as ELF_ARCH? Yes, in arch/tile/include/asm/elf.h. Lovely.

So x86 would be 386, h8300 is h8-300, ia-64 has a dash in it, sparc
also has sparcv9... Hang on, my first grep was against 2.6.36 and said
"EM_XILINX_MICROBLAZE" but now it's just "EM_MICROBLAZE". But now
nios2 has a manufacturer's name prepended to it...

Does sparcv9 actually get recognized? (There's an EM_SPARC32PLUS in
elf-em.h, why is it not here?) Hmmm. The mips3000/mips4000 constant
(both are 10, #defined twice) said they _don't_ (comment said binaries
"rejected by Linux").

Where these constants are actually _used_, it looks like it gets
filtered through elf_check_arch() so "grep -rA1 elf_check_arch linux"
and... Looks like it accepts m32r_old and s390_old binaries, so I
should add those... OR32 and OPENRISC, MICROBLAZE and MICROBLAZE_OLD,
M32R and CYGNUS_M32R

>>> which often deliberately goes against
>>> the manufacturer's wishes: "arm64" versus "aarch64", for example. But that's
>>> an issue for another day.
>>
>> Ah, manufacturer's naming wishes:
>>
>>   http://www.xbitlabs.com/news/cpu/display/20040310223922.html
>>
>> It's a "bud" lite, clearly.
>>
>> (Can of worms, this command...)
>
> i did warn you!

I'm looking at what ubuntu's "file" does for various binaries and it's
saying "Powerpc or Cisco 4500" (what?) and "Renesas SH" which is
inappropriate for jcore. I wondered if it said x86 or 386 for an i686
binary, and it said "Intel 80386". (This netbook is AMD, because
contemporary atoms maxed at 2 gigs ram. I've also historically run
winchip, via, and once got to test a transmeta x86 chip.)

Grumble grumble unavoidable policy grumble...

Right, I'll do an update.

Rob

 1456000102.0


More information about the Toybox mailing list