[Toybox] [PATCH] Distinguish 32- and 64-bit variants in file(1) for x32.

Rob Landley rob at landley.net
Mon Feb 22 15:28:41 PST 2016


On 02/21/2016 11:34 PM, Isaac Dunham wrote:
> On Sun, Feb 21, 2016 at 11:30:09PM -0500, Rich Felker wrote:
>> On Sun, Feb 21, 2016 at 08:42:06PM -0600, Rob Landley wrote:
>>> On 02/21/2016 03:39 PM, Rich Felker wrote:
>>>> On Sat, Feb 20, 2016 at 02:28:22PM -0600, Rob Landley wrote:
>>>>> On Wed, Feb 17, 2016 at 7:02 PM, enh <enh at google.com> wrote:
>>>>>> On Wed, Feb 17, 2016 at 3:32 PM, Rob Landley <rob at landley.net> wrote:
>>>>>>> On Wed, Feb 17, 2016 at 10:22 AM, enh <enh at google.com> wrote:
>>>>>>>> It's necessary to distinguish x86 and x86-64 to be able to recognize the
>>>>>>>> way x32 is encoded in ELF.
>>>>>>>
>>>>>>> Hmmm. That's not fun.
>>>>>>>
>>>>>>> I note that I spent the morning teaching the code to read/display the
>>>>>>> dynamic linker name, so this patch won't "git am" directly.
>>>>>>>
>>>>>>> Reading the patch, we're pretending that arrch64 has nothing to do
>>>>>>> with arm? No mention of arm in this architecture? Ok... (I guess
>>>>>>> Cortex-M isn't arm either, but don't currently have an example binary
>>>>>>> of that to test.)
>>>>>>
>>>>>> well, you're the one who removed my original "ARM aarch64" which is
>>>>>> what the regular desktop file(1) says :-P
>>>>>
>>>>> But not what the linux-kernel developers ever seem to say in their
>>>>> patch submissions.
>>>>
>>>> Regardless of what you think about these naming choices, IMO there's
>>>> little value in a file(1) that does not print the names that scripts
>>>> using it expect to see.
>>>
>>> _What_ scripts? I don't know what would be using this. (All for
>>> real-world tests, but I have yet to find a build script using the file
>>> command. Looking at /proc, sure, but not calling file...)
>>
>> I suspect it's stuff like: case "$(file "$f")" in ...

$ file /bin/ls
/bin/ls: ELF 64-bit LSB  executable, x86-64, version 1 (SYSV),
dynamically linked (uses shared libs), for GNU/Linux 2.6.24,
BuildID[sha1]=9d2a434c4ff55aad2ddd19348c0ac75971606483, stripped

Heck of a switch statement.

>> I'm not thinking of build scripts for packages (this would not be
>> remotely portable usage) but things like private admin scripts,
>> perhaps printer filter scripts, file preview/thumbnail generation
>> scripts, etc. I don't have any good examples at hand but this was the
>> historical justification I always saw for the rather arcane/antiquated
>> forms for many of the names.
> 
> IIRC, I've used file in an HTML index-generating script, in a very
> similar way.

HTML is the home turf of mime types. I agree that if/when we do mime
output, it should match exactly with other implementations. There's even
something vaguely standards-shaped I've bookmarked but not read yet:

http://www.iana.org/assignments/media-types/media-types.xhtml

> (Said script would create a preview/thumbnail or get the
> first few lines of text, then embed that in a table.)
> But that didn't use binary architecture...
> BSD-ish printer filter scripts certainly use 'file'...but again, that
> doesn't deal with ELF binary types.
> (CUPS largely eliminates the need for using file, because it converts
> between known types.)
> 
> A cgi script to pick the 'correct' download to offer someone is the
> only use I can think of.
> Someone's probably found a use for it, but I'll be surprised if someone
> who parses the ELF details enumerated by file is using toybox soon.

Or would rewrite their script to do so...

>>> If the script wants to match "Intel 80386" explicitly, then do I have to
>>> say that for i686?
>>
>> I would think it makes sense to preserve the "Intel 80386" convention
>> here. There's not even a reliable way to detect that a binary is for
>> "i686" anyway.

Manufacturer's name in some chip types but not others? It's _already_
inconsistent...

>>>> The choice to use aarch64 instead of arm64 is
>>>> in some ways also a consequence of this, or rather an intentional
>>>> _mismatch_ with patterns that should not match. The fact that mips64
>>>> and powerpc64 match mips* and powerpc* was historically very
>>>> problematic.
>>>
>>> grep -w? Test for 64 bit first?
>>
>> Indeed, testing for 64 first is the right approach (see musl's
>> configure script for an example), but the problem arises when the
>> existing tests were written before the 64-bit version of the platform
>> existed. I suspect there are lots of scripts that match arm*-*-* in
>> the machine tuple or arm* in `uname -m` which would have wrongly
>> detected "arm64" as arm.
> 
> Oh yes...just about every autoconf script, for example.

You'll notice I implemented uname a long time ago, and for uname -m I
have a gross hack to make it work:

https://github.com/landley/toybox/blob/master/toys/posix/uname.c#L28

Similarly, sed has:

https://github.com/landley/toybox/blob/master/toys/posix/sed.c#L1019

I've put in gross hacks to maintain compatability with empirical test
cases. But I won't do it speculatively, because maybe somewhere there
might be a user we don't know about.

>>> Elliott was suggesting that the elf-em.h constants might be good enough,
>>> but that says 386 not 80386, and PPC instead of Powerpc...
>>>
>>> Seriously, standards would be nice!
>>
>> Yes, wouldn't they? :)
> 
> Speaking of those, I've spotted a number of finer details that need
> some polish...
> - per POSIX, 'cannot open' must be in the 'type' string if open() fails
> (both EPERM and ENOENT); we only do that if open() succeeds and fstat(fd)
> fails.
> - symlink detection (as per POSIX) won't work: opening them O_RDONLY
> results in following the link, then we fstat() the fd.
> - file 'FIFO' causes a hang; open() won't return till there's a writer.
> 
> As far as I can tell, fixing these means we need to call loopfiles_rw
> with failok=1 and O_NONBLOCK in flags, sometimes fall back to lstat(),
> and possibly add O_NOFOLLOW to flags.

I'll take a look.

Thanks,

Rob

 1456183721.0


More information about the Toybox mailing list