[Toybox] wc -l *

enh enh at google.com
Fri Jul 6 19:44:19 PDT 2018


(note that the test changes also fix HOST=1 which was previously failing
those test cases.)

On Fri, Jul 6, 2018, 13:03 enh <enh at google.com> wrote:

> TL;DR: patch attached
>
> (background: i've been trying to use toybox on my desktop too.)
>
> i was surprised to see that toybox `wc -l` doesn't columnate:
>
> $ ./toybox wc -l [Mm]*
> 256 main.c
> 69 Makefile
> 325 total
>
> here's what i was expecting to see.
>
> $ wc -l [Mm]*
>  256 main.c
>   69 Makefile
>  325 total
>
> i thought i'd send a patch, but:
>
> (a) "don't columnate unless more than one flag is set" seems
> deliberate, but i don't understand why:
>
>    for (i = 0; i<4; i++) if (toys.optflags == (1<<i)) space = 0;
>
> (b) POSIX does say _nothing_ should be columnated:
>
>   By default, the standard output shall contain an entry for each
> input file of the form:
>
>   "%d %d %d %s\n", <newlines>, <words>, <bytes>, <file>
>
>   ...
>
>   The output file format pseudo- printf() string differs from the
> System V version of wc:
>
>   "%7d%7d%7d %s\n"
>
>   which produces possibly ambiguous and unparsable results for very
> large files, as it assumes no number shall exceed six digits.
>
> ah, i think i see what you were trying to say... you wanted this:
>
> $ cat /proc/version | wc -l -
> 1 -
> $ cat /proc/version | wc -l
> 1
>
> and `info wc` says
>
>   However, as a GNU extension, if only one count is printed, it is
>   guaranteed to be printed without leading spaces.
>
> hmm. except i can't explain this:
>
> $ wc -l /etc/csh.*
>  18 /etc/csh.cshrc
>  11 /etc/csh.login
>   1 /etc/csh.logout
>  30 total
> $ wc -l /proc/[c]*
> 12 /proc/cgroups
> 1 /proc/cmdline
> 1 /proc/consoles
> 1296 /proc/cpuinfo
> 458 /proc/crypto
> 1768 total
>
> i can't explain (a) why the first example uses a column width of 3,
> nor (b) why the second example doesn't columnate. presumably it's
> something to do with those files claiming size 0, though i've no idea
> how/why it's deciding how big to make the *lines* column from the file
> size. oh, yeah, it can assume that every character in the file is a
> newline, and thus get an upper bound on the number of lines.
>
> okay, so i'm guessing the GNU heuristic is something like a two-pass
> "stat all the files first, and use the max byte count as the the
> column width", and /proc actually isn't a special case in their code:
> it's a bug because their heuristic is broken for files that read
> larger than they claim to be.
>
> so, anyway... it looks like you've implemented the documented GNU
> extension, but in practice they don't actually do what they claim to
> do. it seems like the true GNU extension is actually "there are no
> leading spaces if only one count is printed *and* there's only one
> file".
>
> ah, i think we've just misinterpreted what "only one count" means in
> the GNU doc: they mean one *file*, not one *column*. that certainly
> seems to match the actual behavior.
>
> fix attached.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.landley.net/pipermail/toybox-landley.net/attachments/20180706/50c94a8f/attachment-0001.htm>


More information about the Toybox mailing list