[Toybox] [PATCH] count.c: Human readable -h option and MAYFORK

Mon Oct 16 03:18:20 PDT 2023

On 10/15/23 23:23, Oliver Webb via Toybox wrote:
> Hey, I was looking through the command list and discovered a "count" command.

I wrote my first one of those in 1998 and couldn't figure out why it wasn't part
of the standard unix command set. I posted it all over the place (linux-kernel,
etc) and submitted it to busybox back in 2003 when I first got involved with
that project:

  http://lists.busybox.net/pipermail/busybox/2003-October/043590.html

Which they bikeshedded into "pipe_progress" a month later:

  https://git.busybox.net/busybox/commit/?id=e9080c9f4170

*shrug*

> After learning what it did it reminded me of a tool called pv (Pipe viewer)

Also created a few years after mine (but before the busybox version), but yes
I'm aware of it. (Quite possibly independently developed; as I said it seems
like it SHOULD have been in Unix v7...)

> which does the exact same thing with more info that I was planning to make
> a rudimentary implementation of in toybox.

I've thought about it too, but when I looked into it pv seemed kinda
overengineered to me.

The thing is, stuff like wget knows how long a file is from the http headers, so
it knows how long to make the progress bar. Something like pv can't really know
how much input data to expect, so it can try to make the result pretty with data
rate and such, but can't do a proper progress bar because it has no idea how
much input to expect. And telling it on the command line is a can of worms
(because now the caller is expected to preemptively marshall data from the
producer to the consumer, how does the CALLER know)...

And in the absence of being able to do the right thing automatically pv added
lots of knobs.

Where it starts getting interesting is rate _limiting_, ala rsync --bwlimit=20k
but again in a pipe you're only limiting in one direction and it starts to get
constipated and it's not QUITE a bufferbloat situation but I wound up trying to
put it in the pipeline more than once...

I've looked at adding display and ratelimiting plumbing to lib/ (so things like
wget and rsync could share it) but infrastructure in search of a user is bad
(waiting until I implement the users).

I generally don't have pv installed on my laptop and when I look into it, find
it more trouble than it's worth. YMMV...

> I want to add a -v[erbose] option that displays info like average input/second
> (size/seconds since start) and time since start, Since we are already doing stuff
> with 'millitime()' to speed up the command, It wouldn't be that much work. But it seems
> a bit overkill and feature-creep-y for a command that is so simple so I didn't include 
> one in this patch. I did however add a '-h' Human readable count option.
> 
> The only memory that gets accumulated is 'buf' that can be freed easily at the end of the command.

It doesn't accumulate, it's allocated once and then freed automatically at exit.
I didn't have a call to free() because the OS did it for us so it was a waste of
bytes. (Other such calls I put under if (CFG_TOYBOX_FREE) which is more or less
a debug thing.)

> I did some memory leak checking on count (calling the main function 10,000 times and watching
> in htop if any memory in being accumulated). Then some more precise checking with valgrind.
> count now accumulates as much memory as any other (memory-leakless) command in toybox 
> (80 bytes lost, probably infrastructure stuff but I dunno). And there doesn't seem to be a reason not
> to MAYFORK it.

The advantage of MAYFORK is that it can run within the shell's context. It's a
performance thing in the case of "true" or "echo" that may be run in tight
loops, but for other commands the advantage is the ability to access the shell's
context, so kill can be aware of job control (and thus handle "kill %1") or help
can behave differently when you run "toybox help" vs "toybox sh -c help". This
isn't making use of shell context, and the command only exists in a pipeline and
is thus already a child process...

I'm not seeing the benefit? (Now there might be an argument for making sure the
shell can recursively call builtin commands from subshells and then exit the PID
afterwards, without an extra fork or call to exec. But I _think_ it's already
doing that in the non-nommu codepath just by virtue of calling xexec()? Modulo
Android turned that off with CONFIG_TOYBOX_NORECURSE for some reason...)

As for the -h option: once you've copied enough data, the display will stop
obviously updating. Half the point of count for me was to confirm the transfer
hadn't stalled. (And you don't need the typecast, C will do that for you here.)

I could see adding commas to the output so you can more easily distinguish units
when the number gets big, but human_readable() doesn't do that anymore because
of the saga in these three commits: ec5e6e12ee6d, 54965cadfe8f, 5b7cc6d6c2a5
(basically internationalization was sad).

That said, a -l long output version that has the raw number, and the human
readable, and time, and recent/total rate, might be useful? (I'd prefer -l
instead of -v, but same general concept.)

Rob