[Toybox] wc -l *

enh enh at google.com
Fri Jul 6 13:03:35 PDT 2018


TL;DR: patch attached

(background: i've been trying to use toybox on my desktop too.)

i was surprised to see that toybox `wc -l` doesn't columnate:

$ ./toybox wc -l [Mm]*
256 main.c
69 Makefile
325 total

here's what i was expecting to see.

$ wc -l [Mm]*
 256 main.c
  69 Makefile
 325 total

i thought i'd send a patch, but:

(a) "don't columnate unless more than one flag is set" seems
deliberate, but i don't understand why:

   for (i = 0; i<4; i++) if (toys.optflags == (1<<i)) space = 0;

(b) POSIX does say _nothing_ should be columnated:

  By default, the standard output shall contain an entry for each
input file of the form:

  "%d %d %d %s\n", <newlines>, <words>, <bytes>, <file>

  ...

  The output file format pseudo- printf() string differs from the
System V version of wc:

  "%7d%7d%7d %s\n"

  which produces possibly ambiguous and unparsable results for very
large files, as it assumes no number shall exceed six digits.

ah, i think i see what you were trying to say... you wanted this:

$ cat /proc/version | wc -l -
1 -
$ cat /proc/version | wc -l
1

and `info wc` says

  However, as a GNU extension, if only one count is printed, it is
  guaranteed to be printed without leading spaces.

hmm. except i can't explain this:

$ wc -l /etc/csh.*
 18 /etc/csh.cshrc
 11 /etc/csh.login
  1 /etc/csh.logout
 30 total
$ wc -l /proc/[c]*
12 /proc/cgroups
1 /proc/cmdline
1 /proc/consoles
1296 /proc/cpuinfo
458 /proc/crypto
1768 total

i can't explain (a) why the first example uses a column width of 3,
nor (b) why the second example doesn't columnate. presumably it's
something to do with those files claiming size 0, though i've no idea
how/why it's deciding how big to make the *lines* column from the file
size. oh, yeah, it can assume that every character in the file is a
newline, and thus get an upper bound on the number of lines.

okay, so i'm guessing the GNU heuristic is something like a two-pass
"stat all the files first, and use the max byte count as the the
column width", and /proc actually isn't a special case in their code:
it's a bug because their heuristic is broken for files that read
larger than they claim to be.

so, anyway... it looks like you've implemented the documented GNU
extension, but in practice they don't actually do what they claim to
do. it seems like the true GNU extension is actually "there are no
leading spaces if only one count is printed *and* there's only one
file".

ah, i think we've just misinterpreted what "only one count" means in
the GNU doc: they mean one *file*, not one *column*. that certainly
seems to match the actual behavior.

fix attached.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Fix-wc-column-widths.patch
Type: text/x-patch
Size: 2220 bytes
Desc: not available
URL: <http://lists.landley.net/pipermail/toybox-landley.net/attachments/20180706/7c278736/attachment.bin>


More information about the Toybox mailing list