[Toybox] [PATCH] Distinguish 32- and 64-bit variants in file(1) for x32.

enh enh at google.com
Sat Feb 27 13:02:22 PST 2016


On Sun, Feb 21, 2016 at 8:30 PM, Rich Felker <dalias at libc.org> wrote:
> On Sun, Feb 21, 2016 at 08:42:06PM -0600, Rob Landley wrote:
>> On 02/21/2016 03:39 PM, Rich Felker wrote:
>> > On Sat, Feb 20, 2016 at 02:28:22PM -0600, Rob Landley wrote:
>> >> On Wed, Feb 17, 2016 at 7:02 PM, enh <enh at google.com> wrote:
>> >>> On Wed, Feb 17, 2016 at 3:32 PM, Rob Landley <rob at landley.net> wrote:
>> >>>> On Wed, Feb 17, 2016 at 10:22 AM, enh <enh at google.com> wrote:
>> >>>>> It's necessary to distinguish x86 and x86-64 to be able to recognize the
>> >>>>> way x32 is encoded in ELF.
>> >>>>
>> >>>> Hmmm. That's not fun.
>> >>>>
>> >>>> I note that I spent the morning teaching the code to read/display the
>> >>>> dynamic linker name, so this patch won't "git am" directly.
>> >>>>
>> >>>> Reading the patch, we're pretending that arrch64 has nothing to do
>> >>>> with arm? No mention of arm in this architecture? Ok... (I guess
>> >>>> Cortex-M isn't arm either, but don't currently have an example binary
>> >>>> of that to test.)
>> >>>
>> >>> well, you're the one who removed my original "ARM aarch64" which is
>> >>> what the regular desktop file(1) says :-P
>> >>
>> >> But not what the linux-kernel developers ever seem to say in their
>> >> patch submissions.
>> >
>> > Regardless of what you think about these naming choices, IMO there's
>> > little value in a file(1) that does not print the names that scripts
>> > using it expect to see.
>>
>> _What_ scripts? I don't know what would be using this. (All for
>> real-world tests, but I have yet to find a build script using the file
>> command. Looking at /proc, sure, but not calling file...)
>
> I suspect it's stuff like: case "$(file "$f")" in ...
>
> I'm not thinking of build scripts for packages (this would not be
> remotely portable usage) but things like private admin scripts,
> perhaps printer filter scripts, file preview/thumbnail generation
> scripts, etc. I don't have any good examples at hand but this was the
> historical justification I always saw for the rather arcane/antiquated
> forms for many of the names.
>
>> If the script wants to match "Intel 80386" explicitly, then do I have to
>> say that for i686?
>
> I would think it makes sense to preserve the "Intel 80386" convention
> here. There's not even a reliable way to detect that a binary is for
> "i686" anyway.

i also think it makes sense to use the same names as the GNU file(1),
because for lack of a real standard, "what the file(1) that everybody
is is running does" is about as close as we'll get to a standard. yes,
their names are an awful, inconsistent historical mess, but that's how
real standards tend to turn out anyway :-)

if we think those names are too big a mess to stomach (and i can
certainly understand that POV, at least until using a cleaner set is
proven to cause trouble for actual scripts), then i think using the
constant names from the kernel's uapi/linux/elf-em.h is fine too.
(they're at least fairly logical.) i am aware that they say 386 and
PPC, but if we're aiming for full compatibility with everyone else's
file(1) we don't want to go this route anyway! i don't think that any
_human_ sophisticated enough to be looking at file(1)'s output for an
ELF file is going to be confused by "386" vs "Intel 80386" or "PPC"
for "PowerPC" :-)

on the other hand i definitely _don't_ think the world needs a _third_
"standard".

i'm happy to provide a patch for either of "file(1) names" or "kernel
elf-em.h names" if we can agree on which...

(i can also supply hello world ELF binaries for all six architectures
Android supports, which -- even if you do set up your qemu instances
-- might still be mildly interesting because they have some slightly
different ELF notes than one sees in desktop linux ELF binaries.)

>> > The choice to use aarch64 instead of arm64 is
>> > in some ways also a consequence of this, or rather an intentional
>> > _mismatch_ with patterns that should not match. The fact that mips64
>> > and powerpc64 match mips* and powerpc* was historically very
>> > problematic.
>>
>> grep -w? Test for 64 bit first?
>
> Indeed, testing for 64 first is the right approach (see musl's
> configure script for an example), but the problem arises when the
> existing tests were written before the 64-bit version of the platform
> existed. I suspect there are lots of scripts that match arm*-*-* in
> the machine tuple or arm* in `uname -m` which would have wrongly
> detected "arm64" as arm.
>
>> Elliott was suggesting that the elf-em.h constants might be good enough,
>> but that says 386 not 80386, and PPC instead of Powerpc...
>>
>> Seriously, standards would be nice!
>
> Yes, wouldn't they? :)
>
>> (The mime types are designed to be programattically interpreted. The
>> post here about a mime type database is very interesting and I have a
>> tab open. But file type output seems to be for humans...?)
>
> Yes, mime types are a lot more consistent but less informative than
> the freeform file(1) strings.
>
> Rich



-- 
Elliott Hughes - http://who/enh - http://jessies.org/~enh/
Android native code/tools questions? Mail me/drop by/add me as a reviewer.

 1456606942.0


More information about the Toybox mailing list