[Toybox] [PATCH] optional fatter cat(1)

Rob Landley rob at landley.net
Mon Jan 5 05:34:40 PST 2015


On 01/04/2015 06:31 PM, Rich Felker wrote:
> On Sun, Jan 04, 2015 at 06:20:05PM -0600, Rob Landley wrote:
>> Except that on Linux, "non-bash scripts" were almost nonexistent before
>> 2006 (ported from other OSes and you know where to find ksh or zsh if it
>> says #!/bin/zsh), and even since then it's still a minority and not very
>> _interesting_.
> 
> Every single practical configure script (autoconf generated or written
> by hand) is a non-bash script. That's quite a huge portion of shell
> scripts. And they've been around since the beginning on Linux.

You're using autoconf as a positive example to _support_ your argument?

>> I view posix sh -> bash much the way I System V -> Linux. (System V
>> didn't have procfs. I'm writing stuff that uses procfs. There are
>> compatability  modes that implement that in BSD, and limiting yourself
>> to /etc/mtab not being a symlink to /proc/mounts means you're not
>> dealing with reality on  modern system. You _can't_ get that right
>> entirely in userspace these days.)
> 
> These are all implementation details that matter only to a few
> boot-time utilities and such. They're all irrelevant to applications.
> It's stupid and inappropriate for applications to assume procfs or
> mtab or even the concept of mounts, which won't exist on future
> systems when Linux is considered a dinosaur...

A "best practices" gui application from 20 years ago would either be
using raw xlib calls or a widget toolkit like motif. It would be IPv4
only, couldn't have supported UTF-8, any encryption or hash algorithm
would be _hilarious_ before you even got into export controls, it would
be written in c89 for single processor 32 bit systems. If you're lucky
it might have understood http 1.0 instead of gopher, and implement the
html "blink" or "marquee" tags (but would predate css or javascript). It
would still predate mp3 by a few months, could expect 800x600 screen
resolution at 256 colors (but should scale to 1024x768 with 65k colors
at the high end), and let's just ignore fonts entirely for the sake of
sanity... As or compression algorithms, pkzip 2.0 introduced deflate in
1993 so it's at least be possible instead of one of the dozen other now
largely forgotten variants, but the deflate rfc wasn't for another year.
This means no png (gif was 256 colors)...

You're implying there's a way to write code that's so good nobody would
ever need to maintain or port it again. Your example would be what,
xmms? Or are you saying nobody's ever managed this before but we should
start now?

Back then I personally was doing OS/2 code, because I didn't see the
point of Sun workstations. (Which were themselves switching from SunOS
to Solaris while I was poking at them.) I'm not saying Linux is eternal,
I'm saying portability to the unknown is another variant of
"infrastructure in search of a user". It's not hard to write new code in
future, and if an existing codebase of nontrivial size isn't already
actively maintained nobody's going to inherit OpenWatcom and go "yes,
that, I'll learn it and make it mine" rather than writing a new one,
because reading code isn't going to _stop_ being harder than writing
code any time soon. (Yes Erik Anderson did repurpose busybox and uClibc
instead of starting over, but A) he was unusual, B) each of those
codebases had only _existed_ for about 5 years when he started.)

> This is why the standards omit them.

Clearly IBM paying to have OS/360 be posix compliant and Microsoft
paying to have NT be certified posix complaint had nothing to do with
this subset selection.

> Standards are about what an application can
> expect to see, not how it gets done under the hood.

Posix removed cpio after 2001, and never standardized the 8 digit
(rather than 6 digit) variant of it used by rpm and initramfs. Posix
standardized pax (which almost nobody uses outside Sun), but not tar
(which is almost ubiquitous outside of windows, which uses zip, which is
not mentioned in posix. There's an IETF RFC on zip, but there are over
7000 other RFCs most of which are useless).

Sure, you can mix and match various standards bodies to describe what
you're doing. Or, in the case of large corporations, sponsor a standards
body to certify your description of what you're already doing. Nobody
ever games that:

http://en.wikipedia.org/wiki/Standardization_of_Office_Open_XML

I remember the _year_ a company I once worked at spent trying to
implement a realplayer compatible server using the RTSP "standard"
they'd put forth:

http://www.ietf.org/rfc/rfc2326.txt

(Spoiler: it couldn't be done. There was a required encoding layer,
somewhere between encryption and pure obfuscation, that wasn't
documented but was totally required to actually talk to their servers or
clients with the protocol they'd published as a "standard".)

> They're about
> moving us forward into compatibility and interoperability, not
> backwards into lock-in to outdated implementation details.

You're trying to predict the future in a way that will prevent you from
having to respond to it. I'm expecting to have to constantly change to
adapt to the future as it arrives.

I throw out waaay more code than I ever ship. The stuff I ship is the
best I can do _currently_, but I just ripped out and redid a chunk of
the flag parsing logic yet again because I needed it to do something
else, and what took me so long is I didn't want to add another layer on
top but figure out how to _remove_ old code that was no longer relevant
(I.E. generated/oldtoys.h).

If circumstances change in future to render current code obsolete, than
I (or somebody else) will write new code. This is how it works. I'm not
that tied to the existing implementation of _anything_, and that
includes the current version of the standards documents. (It wasn't
_that_ long ago I was looking into chucking the existing codebase and
rewriting it in LUA. That turned out not to be viable, but "written in
C" is a pretty fundamental assumption for a project like toybox and I
gave it a go of not doing that _after_ making patch.c actually work with
real-world data despite posix still to this day not having the concept
of a unified diff.)

>>>> I repeat: ubuntu made a bad technical decision, gratuitously breaking
>>>> compatibility for its existing userbase for a stated reason that was
>>>> inadequate to justify the fallout, one which could easily have been
>>>> accomplished a different way without end user impact, and fairly quickly
>>>> _was_ because it didn't accomplish its stated goal but the change was
>>>> retained anyway.
>>>
>>> I think the security and runtime-cost benefits were more than
>>> sufficient to justify the "fallout".
>>
>> I see no benefits.
> 
> 2x VSZRW, 5x actual ram usage (dirty pages), per instance.
> 
> And anecdotally (I don't have figures) performance is considerably
> better.

There's a variant of Gates' Law that applies specifically to the FSF.
Anything they maintain rapidly turns into a pile of crap, they have a
hugely corrosive effect on any software they're involved with and this
is nothing new.

But: when you inevitably write a new one (the way busybox and android
and klibc did, or the various "yet another blah" or
"somethingorother-ng" projects), you target compatability with what
people are using, not an ideal nobody currently actually conforms to.

We didn't get chromium and firefox from people trying to implement
standards documents. We got them from people throwing ever more
real-world data at their programs and iteratively fixing what didn't
work (and then going back and submitting changes to the standards
documents to make them less sad).

That's not what dash did, running real-world scripts and fixing their
shell. They shipped a shell based on some idealistic vision in a
standards document and broke everybody's projects because they were big
enough to throw their weight around. Just like Microsoft used to be able
to do.

Trying to shove dash down people's throats was the same impulse as
making "vi" and "vim" behave differently, or unity as a common desktop
between phones and laptops. It was Mark Shuttleworth imposing policy on
his userbase.

> And of course, most importantly, complete lack of function exports and
> other dangerous code paths processing potentially untrusted data that
> should never have existed in the first place.

So you're saying they predicted heartbleed 8 years in advance, and
that's why they shipped something you could trivially segfault that
didn't get basic signal handling right and which assumed that if you
didn't have a controlling terminal you were never in interactive mode
and thus shouldn't get a prompt? (On _top_ of "hey, you broke the kernel
build, this is news to the kernel development community, but it's not
like "compiling linux" is an activity common enough to linux systems
that people would have noticed it breaking over the previous 15 years...)

>>> The broken scripts are easily
>>> remedied just by fixing the #! line, or they can be made portable if
>>> that was the intent to begin with.
>>
>> Every script I know changed the #! line, but that was a bug workaround
>> for ubuntu breaking the /bin/sh symlink.
> 
> Every script you know? No, some small set of bash scripts that were
> wrongly labeled as sh scripts. The vast majority of the scripts you
> know

Thanks for telling me what I know. It's good that you're more of an
authority on it than I am. I had previously been unaware of that.

> are configure scripts and most of them even run on ancient
> pre-POSIX shells.

The output of autoconf is only coincidentally a shell script. It could
just as easily have been C.

The largest non-autoconf configure script I have lying around at the
moment is the one in qemu. I just checked out the last version from 2006
(git 388c45084d54), back before ubuntu broke #!/bin/sh, and tried to run
it on current ubuntu:

  $ ./configure
  WARNING: "gcc" looks like gcc 4.x
  Looking for gcc 3.x
  ./configure: 358: ./configure: Syntax error: Bad fd number
  landley at driftwood:~/qemu/qemu$

So:

A) hardwired gcc 3.x dependency that they made go away by rewriting
their entire code generator (replacing dyngen with tcg) because you can
do that: writing new code in future is always an option.

B) the "bad fd number" was because despite saying #!/bin/sh at the top
this script had only ever been tested with bash because THAT IS MY POINT.

C) these guys care about portability to _windows_ enough to use ptrtype
macros for the insane win64 LLP64 stuff. (Not running windows clients,
running on windows _hosts_. Why? Because IBM, git cc3ac9c4a6fd for example.)

> Rich

Rob


More information about the Toybox mailing list