[Toybox] [PATCH] Add the gzip/gunzip/zcat I wrote for toolbox.

enh enh at google.com
Tue May 9 15:08:04 PDT 2017


On Tue, May 9, 2017 at 1:25 PM, Rob Landley <rob at landley.net> wrote:
> On 05/08/2017 08:27 PM, enh wrote:
>> On Mon, May 8, 2017 at 4:37 PM, Rob Landley <rob at landley.net> wrote:
>>> On 04/26/2017 05:03 PM, enh wrote:
>>>> if you're actually going to start to look, i'll attach my port of the
>>>> current toolbox implementation.
>>>
>>> Ok, carving out a half hour to look at this... optargs should really
>>> have [-123456789] at the end so "gzip -3 -9" is -9 but "gzip -9 -3" is
>>> -3. While we're at it, OLDTOY() was there so these things could share
>>> option strings, except why does zcat have -cd?
>>
>> i just copied what the "real thing" does --- all three commands accept
>> the same set of options, and just ignore the ones they can't use. i
>> assumed _someone_ is relying on that. (i can't remember if i actually
>> hit such an instance --- i don't remember how it was that i realized
>> that they all accept all the options, but i do remember in the
>> beginning i had a different getopt string for each one, but changed
>> that even in the toolbox version.)
>
> It's times like this where busybox is useful, since that's 10+ years of
> an alternate implementation lots of people have had plenty of
> opportunity to complain about. :)
>
> (I vaguely recall that the IETF used to have a policy that standards
> submissions had to have two different interoperable implementations.
> This was a good policy.)
>
> Checking busybox defconfig from february, zcat --help documents no
> options...
>
>   ./busybox zcat -c README.gz
>   ./busybox zcat --fruitbasket README.gz
>
> And is ignoring all options presented to it. (Blah. Proving nothing in
> this instance.)

well, i took that as proof that we probably should ignore all the
other arguments. (since both gnu and busybox did.)

> Hey, ubuntu's zcat will take non-gz filenames and look for a
> corresponding gz file. (I.E. "zcat README" will display README.gz).
> Should we do that? (Probably.) the man page says zcat is identical to
> "gunzip -c", is this true here?
>
>   gzip README
>   gunzip -c README
>
> Yup, that also finds the .gz version when not given a .gz. (The busybox
> version doesn't have this trick though...)

yeah, i noticed that but thought it was gross and used the fact that
busybox doesn't support it to say "YAGNI".

i should probably have added a note at the top of the file, but while
documenting deviations from POSIX makes a lot of sense, listing every
GNU wart isn't quite as obvious. (though i guess this would have been
a special and more interesting case: not just a GNU wart but one that
busybox doesn't share.)

> Busybox gzip documents -dtcf but not the numbers. But it accepts the
> numbers. And it's being specific about that:
>
>   $ ./busybox gzip -7 --potato README
>   gzip: unrecognized option '--potato'
>
> Sigh. Ok, this time checking busybox was less useful than I'd hoped. :)

(which is maybe another argument for always documenting deliberate
differences: for the benefit of the next project to tread this path
:-) )

>>> (And the three main
>>> functions... if oldtoy isn't handling this right it needs fixing. Do you
>>> care about the standalone builds, by the way? that was always the
>>> complicating part, chording together shared infrastructure...)
>>
>> i don't personally understand why someone would want one of these but
>> not all three, no.
>
> Running "make test_zcat" does a standalone build of zcat, which should
> act like zcat and not like gzip or gunzip. So there's one use case. :)
>
>>> I need to spend a 3 day weekend fixing the help infrastructure so it can
>>> do reasonable includes, the duplication of -c and -f help text here
>>> pains me.
>>
>> the fact that the ps/top/pgrep help is just wrong pains me more :-)
>>
>> note that -f isn't the same between all three, and the -c is slightly
>> different for all three. (and they'd differ more if we behaved more
>> like the "real thing" and documented accurately.)
>
> What does zcat -c _do_? (Make it _not_ act like zcat? As far as I can
> tell ubuntu's is ignoring it.) How would -f behavior differ? (Is this in
> the tests you sent?)

the thing i remember not implementing is that GNU at least checks
isatty. i don't remember whether busybox does. iirc i just have a TODO
in the tests saying "is there any way to write a test for this?".

> I see "acts slightly different" and hear "special case" and wonder "is
> there any way we could not?" Which is where I have to read the code and
> the man page and come up with test cases and wrap my head around _why_
> and then see if there are users out there depending on this weirdness
> (although mostly I check package configure and builds because I can find
> and automate a lot of those) and...
>
> I miss free time. It was nice.
>
>>> Should gzip do that? You have it exiting immediately... Huh, it looks
>>> like the gnu/gnu/gnu/dammit version _does_ exit at the first error,
>>> which seems awkward and wrong. So this matches the existing version, the
>>> question remains what the behavior _should_ be.
>>
>> in the absence of a strong reason to do anything different, i just did
>> what i observed the "real thing" doing.
>
> Oh sure. It's a great first pass, I'm just doing my normal gift horse
> dentistry.
>
> I try not to think in terms of "real thing", but instead the old IETF
> model. The toybox implementation has the potential to become the "real
> thing" for a lot of people in future. I'm trying to work out if there
> _was_ a spec (I.E. if posix was functional), what it would look like?

this was exactly why i _deleted_ the code for handling a missing
extension. i did implement it and wrote tests, but looked at what i'd
done and just felt like i wasn't doing the future any favors. (a quick
pop quiz showed that no one i work with knew about this wart.)

> Warn-but-continue is the common behavior elsewhere in toybox, because
> that was common behavior in the existing utilities toybox was
> implementing new versions of. I should survey the existing toybox
> commands and see if there are _any_ that stop on error when handling
> [FILE...] arguments, and if so (I don't remember any) compare the ubuntu
> implementation to make sure it wasn't our mistake.
>
> If gzip is the only oddball, I'd prefer to correct it in toybox. If
> there are others, then it's not a special case...
>
>> afaik the only differences are omissions.
>
> Yeah, but _should_ there be? Newbies learning the unixoid command line
> are helped by consistency. (That's why in toybox everything accepts --
> even "echo", regardless of what ubuntu did.) If future generations
> learning this stuff _don't_ outnumber the current installed base, we're
> doing it wrong.
>
> And if we are going to clean up this sort of thing, the initial
> introduction is the place to do it. If people need it, they'll complain.
> (Later people would complain just because it changed.)
>
> I can see you not wanting to field the bug reports, though. :)

"why is so much stuff missing from top/ps/pgrep?" is my mostly
frequent bug report atm, because people trust that --help is telling
them the whole truth :-)

> Rob
>
> P.S. Remember how I disabled --help output for "true" and "false"
> because people complained? Bash has built-in "true" and "false"
> implementations that behave like toybox does now, but ubuntu _also_ has
> /bin/false and /bin/true behave like toybox _used_ to, and yes:
>
>   $ /bin/true --help > /dev/full
>   /bin/true: write error: No space left on device
>   $ echo $?
>   1
>
> Moral of the story: the gnu/gnu/gnu/dammit stuff is not compatible with
> _itself_. Second moral of the story:
>
>   $ /bin/true --help | wc -l
>   15
>   $ man true | wc -l
>   50
>
> Yet that man page ends with:
>
>   SEE ALSO
>     The  full documentation for true is maintained as a Texinfo
>     manual.  If the info and true programs are properly installed
>     at your site, the command
>
>        info coreutils 'true invocation'
>
>     should give you access to the complete manual.
>
> Because 15 lines of built-in help text plus 50 lines of man page is not
> enough to fully describe the "true" command, for the _full_
> documentation they need their bespoke proprietary documentation format
> nobody else uses but they refuse to give up.
>
> This sort of thing is why I don't see gnu versions of anything as "the
> real thing", more what people used in the bad old days before we knew
> better, uphill, both ways, in the snow.

the reason i tend to say "real thing" rather than "gnu" is just
because i'm too lazy to check. i happen to know bzip2 isn't gnu, for
example, and i'm pretty sure (because i maintain Android's zlib) that
gzip is gnu (because it's not part of the zlib distribution, though
they do have a cut-down version), and unzip/zip probably aren't gnu
(even though they're not in the zlib distribution either) because they
just don't "feel" like gnu tools.

i've also been known to say "desktop" or "traditional", but both of
those alternatives have their problems too.

> Still Rob



-- 
Elliott Hughes - http://who/enh - http://jessies.org/~enh/
Android native code/tools questions? Mail me/drop by/add me as a reviewer.



More information about the Toybox mailing list