[Toybox] [PATCH] Distinguish 32- and 64-bit variants in file(1) for x32.

Isaac Dunham ibid.ag at gmail.com
Sun Feb 21 21:34:59 PST 2016


On Sun, Feb 21, 2016 at 11:30:09PM -0500, Rich Felker wrote:
> On Sun, Feb 21, 2016 at 08:42:06PM -0600, Rob Landley wrote:
> > On 02/21/2016 03:39 PM, Rich Felker wrote:
> > > On Sat, Feb 20, 2016 at 02:28:22PM -0600, Rob Landley wrote:
> > >> On Wed, Feb 17, 2016 at 7:02 PM, enh <enh at google.com> wrote:
> > >>> On Wed, Feb 17, 2016 at 3:32 PM, Rob Landley <rob at landley.net> wrote:
> > >>>> On Wed, Feb 17, 2016 at 10:22 AM, enh <enh at google.com> wrote:
> > >>>>> It's necessary to distinguish x86 and x86-64 to be able to recognize the
> > >>>>> way x32 is encoded in ELF.
> > >>>>
> > >>>> Hmmm. That's not fun.
> > >>>>
> > >>>> I note that I spent the morning teaching the code to read/display the
> > >>>> dynamic linker name, so this patch won't "git am" directly.
> > >>>>
> > >>>> Reading the patch, we're pretending that arrch64 has nothing to do
> > >>>> with arm? No mention of arm in this architecture? Ok... (I guess
> > >>>> Cortex-M isn't arm either, but don't currently have an example binary
> > >>>> of that to test.)
> > >>>
> > >>> well, you're the one who removed my original "ARM aarch64" which is
> > >>> what the regular desktop file(1) says :-P
> > >>
> > >> But not what the linux-kernel developers ever seem to say in their
> > >> patch submissions.
> > > 
> > > Regardless of what you think about these naming choices, IMO there's
> > > little value in a file(1) that does not print the names that scripts
> > > using it expect to see.
> > 
> > _What_ scripts? I don't know what would be using this. (All for
> > real-world tests, but I have yet to find a build script using the file
> > command. Looking at /proc, sure, but not calling file...)
> 
> I suspect it's stuff like: case "$(file "$f")" in ...
> 
> I'm not thinking of build scripts for packages (this would not be
> remotely portable usage) but things like private admin scripts,
> perhaps printer filter scripts, file preview/thumbnail generation
> scripts, etc. I don't have any good examples at hand but this was the
> historical justification I always saw for the rather arcane/antiquated
> forms for many of the names.

IIRC, I've used file in an HTML index-generating script, in a very
similar way. (Said script would create a preview/thumbnail or get the
first few lines of text, then embed that in a table.)
But that didn't use binary architecture...
BSD-ish printer filter scripts certainly use 'file'...but again, that
doesn't deal with ELF binary types.
(CUPS largely eliminates the need for using file, because it converts
between known types.)

A cgi script to pick the 'correct' download to offer someone is the
only use I can think of.
Someone's probably found a use for it, but I'll be surprised if someone
who parses the ELF details enumerated by file is using toybox soon.

> > If the script wants to match "Intel 80386" explicitly, then do I have to
> > say that for i686?
> 
> I would think it makes sense to preserve the "Intel 80386" convention
> here. There's not even a reliable way to detect that a binary is for
> "i686" anyway.
> 
> > > The choice to use aarch64 instead of arm64 is
> > > in some ways also a consequence of this, or rather an intentional
> > > _mismatch_ with patterns that should not match. The fact that mips64
> > > and powerpc64 match mips* and powerpc* was historically very
> > > problematic.
> > 
> > grep -w? Test for 64 bit first?
> 
> Indeed, testing for 64 first is the right approach (see musl's
> configure script for an example), but the problem arises when the
> existing tests were written before the 64-bit version of the platform
> existed. I suspect there are lots of scripts that match arm*-*-* in
> the machine tuple or arm* in `uname -m` which would have wrongly
> detected "arm64" as arm.

Oh yes...just about every autoconf script, for example.

> > Elliott was suggesting that the elf-em.h constants might be good enough,
> > but that says 386 not 80386, and PPC instead of Powerpc...
> > 
> > Seriously, standards would be nice!
> 
> Yes, wouldn't they? :)

Speaking of those, I've spotted a number of finer details that need
some polish...
- per POSIX, 'cannot open' must be in the 'type' string if open() fails
(both EPERM and ENOENT); we only do that if open() succeeds and fstat(fd)
fails.
- symlink detection (as per POSIX) won't work: opening them O_RDONLY
results in following the link, then we fstat() the fd.
- file 'FIFO' causes a hang; open() won't return till there's a writer.

As far as I can tell, fixing these means we need to call loopfiles_rw
with failok=1 and O_NONBLOCK in flags, sometimes fall back to lstat(),
and possibly add O_NOFOLLOW to flags.

Thanks,
Isaac

 1456119299.0


More information about the Toybox mailing list