[Toybox] [PATCH] Distinguish 32- and 64-bit variants in file(1) for x32.

Rich Felker dalias at libc.org
Sun Feb 21 20:30:09 PST 2016


On Sun, Feb 21, 2016 at 08:42:06PM -0600, Rob Landley wrote:
> On 02/21/2016 03:39 PM, Rich Felker wrote:
> > On Sat, Feb 20, 2016 at 02:28:22PM -0600, Rob Landley wrote:
> >> On Wed, Feb 17, 2016 at 7:02 PM, enh <enh at google.com> wrote:
> >>> On Wed, Feb 17, 2016 at 3:32 PM, Rob Landley <rob at landley.net> wrote:
> >>>> On Wed, Feb 17, 2016 at 10:22 AM, enh <enh at google.com> wrote:
> >>>>> It's necessary to distinguish x86 and x86-64 to be able to recognize the
> >>>>> way x32 is encoded in ELF.
> >>>>
> >>>> Hmmm. That's not fun.
> >>>>
> >>>> I note that I spent the morning teaching the code to read/display the
> >>>> dynamic linker name, so this patch won't "git am" directly.
> >>>>
> >>>> Reading the patch, we're pretending that arrch64 has nothing to do
> >>>> with arm? No mention of arm in this architecture? Ok... (I guess
> >>>> Cortex-M isn't arm either, but don't currently have an example binary
> >>>> of that to test.)
> >>>
> >>> well, you're the one who removed my original "ARM aarch64" which is
> >>> what the regular desktop file(1) says :-P
> >>
> >> But not what the linux-kernel developers ever seem to say in their
> >> patch submissions.
> > 
> > Regardless of what you think about these naming choices, IMO there's
> > little value in a file(1) that does not print the names that scripts
> > using it expect to see.
> 
> _What_ scripts? I don't know what would be using this. (All for
> real-world tests, but I have yet to find a build script using the file
> command. Looking at /proc, sure, but not calling file...)

I suspect it's stuff like: case "$(file "$f")" in ...

I'm not thinking of build scripts for packages (this would not be
remotely portable usage) but things like private admin scripts,
perhaps printer filter scripts, file preview/thumbnail generation
scripts, etc. I don't have any good examples at hand but this was the
historical justification I always saw for the rather arcane/antiquated
forms for many of the names.

> If the script wants to match "Intel 80386" explicitly, then do I have to
> say that for i686?

I would think it makes sense to preserve the "Intel 80386" convention
here. There's not even a reliable way to detect that a binary is for
"i686" anyway.

> > The choice to use aarch64 instead of arm64 is
> > in some ways also a consequence of this, or rather an intentional
> > _mismatch_ with patterns that should not match. The fact that mips64
> > and powerpc64 match mips* and powerpc* was historically very
> > problematic.
> 
> grep -w? Test for 64 bit first?

Indeed, testing for 64 first is the right approach (see musl's
configure script for an example), but the problem arises when the
existing tests were written before the 64-bit version of the platform
existed. I suspect there are lots of scripts that match arm*-*-* in
the machine tuple or arm* in `uname -m` which would have wrongly
detected "arm64" as arm.

> Elliott was suggesting that the elf-em.h constants might be good enough,
> but that says 386 not 80386, and PPC instead of Powerpc...
> 
> Seriously, standards would be nice!

Yes, wouldn't they? :)

> (The mime types are designed to be programattically interpreted. The
> post here about a mime type database is very interesting and I have a
> tab open. But file type output seems to be for humans...?)

Yes, mime types are a lot more consistent but less informative than
the freeform file(1) strings.

Rich

 1456115409.0


More information about the Toybox mailing list