[Toybox] [PATCH] count.c: Human readable -h option and MAYFORK

Oliver Webb aquahobbyist at proton.me
Mon Oct 16 17:28:26 PDT 2023


------- Original Message -------
On Monday, October 16th, 2023 at 05:14, Rob Landley <rob at landley.net> wrote:


> On 10/15/23 23:23, Oliver Webb via Toybox wrote:
> 
> > Hey, I was looking through the command list and discovered a "count" command.
> 
> 
> I wrote my first one of those in 1998 and couldn't figure out why it wasn't part
> of the standard unix command set. I posted it all over the place (linux-kernel,
> etc) and submitted it to busybox back in 2003 when I first got involved with
> that project:
> 
> http://lists.busybox.net/pipermail/busybox/2003-October/043590.html
> 
> Which they bikeshedded into "pipe_progress" a month later:
> 
> https://git.busybox.net/busybox/commit/?id=e9080c9f4170
> 
> shrug
> 
> > After learning what it did it reminded me of a tool called pv (Pipe viewer)
> 
> 
> Also created a few years after mine (but before the busybox version), but yes
> I'm aware of it. (Quite possibly independently developed; as I said it seems
> like it SHOULD have been in Unix v7...)
>
> > which does the exact same thing with more info that I was planning to make
> > a rudimentary implementation of in toybox.
> 
> 
> I've thought about it too, but when I looked into it pv seemed kinda
> overengineered to me.
> 
> The thing is, stuff like wget knows how long a file is from the http headers, so
> it knows how long to make the progress bar. Something like pv can't really know
> how much input data to expect, so it can try to make the result pretty with data
> rate and such, but can't do a proper progress bar because it has no idea how
> much input to expect. And telling it on the command line is a can of worms
> (because now the caller is expected to preemptively marshall data from the
> producer to the consumer, how does the CALLER know)...
> 
> And in the absence of being able to do the right thing automatically pv added
> lots of knobs.
> 
> Where it starts getting interesting is rate limiting, ala rsync --bwlimit=20k
> but again in a pipe you're only limiting in one direction and it starts to get
> constipated and it's not QUITE a bufferbloat situation but I wound up trying to
> put it in the pipeline more than once...
> 
> I've looked at adding display and ratelimiting plumbing to lib/ (so things like
> wget and rsync could share it) but infrastructure in search of a user is bad
> (waiting until I implement the users).
> 
> I generally don't have pv installed on my laptop and when I look into it, find
> it more trouble than it's worth. YMMV...
> 
> > I want to add a -v[erbose] option that displays info like average input/second
> > (size/seconds since start) and time since start, Since we are already doing stuff
> > with 'millitime()' to speed up the command, It wouldn't be that much work. But it seems
> > a bit overkill and feature-creep-y for a command that is so simple so I didn't include
> > one in this patch. I did however add a '-h' Human readable count option.
> > 
> > The only memory that gets accumulated is 'buf' that can be freed easily at the end of the command.
> 
> 
> It doesn't accumulate, it's allocated once and then freed automatically at exit.
> I didn't have a call to free() because the OS did it for us so it was a waste of
> bytes. (Other such calls I put under if (CFG_TOYBOX_FREE) which is more or less
> a debug thing.)
> 
> > I did some memory leak checking on count (calling the main function 10,000 times and watching
> > in htop if any memory in being accumulated). Then some more precise checking with valgrind.
> > count now accumulates as much memory as any other (memory-leakless) command in toybox
> > (80 bytes lost, probably infrastructure stuff but I dunno). And there doesn't seem to be a reason not
> > to MAYFORK it.
> 
> 
> The advantage of MAYFORK is that it can run within the shell's context. It's a
> performance thing in the case of "true" or "echo" that may be run in tight
> loops, but for other commands the advantage is the ability to access the shell's
> context, so kill can be aware of job control (and thus handle "kill %1") or help
> can behave differently when you run "toybox help" vs "toybox sh -c help". This
> isn't making use of shell context, and the command only exists in a pipeline and
> is thus already a child process...
> 
> I'm not seeing the benefit? (Now there might be an argument for making sure the
> shell can recursively call builtin commands from subshells and then exit the PID
> afterwards, without an extra fork or call to exec. But I think it's already
> doing that in the non-nommu codepath just by virtue of calling xexec()? Modulo
> Android turned that off with CONFIG_TOYBOX_NORECURSE for some reason...)

I was going under the assumption that commands should be MAYFORK until proven nonviable to do so 
(grep leaking allocation context, cd only being useful as a shell builtin, etc). 
Instead of only MAYFORK-ing if there is a clear reason to do so.
 
> As for the -h option: once you've copied enough data, the display will stop
> obviously updating. Half the point of count for me was to confirm the transfer
> hadn't stalled. (And you don't need the typecast, C will do that for you here.)
> 
> I could see adding commas to the output so you can more easily distinguish units
> when the number gets big, but human_readable() doesn't do that anymore because
> of the saga in these three commits: ec5e6e12ee6d, 54965cadfe8f, 5b7cc6d6c2a5
> (basically internationalization was sad).
> 
> That said, a -l long output version that has the raw number, and the human
> readable, and time, and recent/total rate, might be useful? (I'd prefer -l
> instead of -v, but same general concept.)

I added a -l[ong] option in the below patch, It only shows the average rate, not recent

While trying to do math with the time, 
I noticed that 'then' was never set to anything after initialization to 0, 
and 'now' is only used while setting it to millitime(). 
Which means the check for "Have we already done this in the last 250ms?" might as well be 
"if(millitime()<250) continue;". I fixed this by setting 'then' at the end of the for loop

> Rob

- Oliver Webb <aquahobbyist at proton.me>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-count.c-Fixed-optimization-bug-added-l.patch
Type: text/x-patch
Size: 1940 bytes
Desc: not available
URL: <http://lists.landley.net/pipermail/toybox-landley.net/attachments/20231017/1462a82b/attachment.bin>


More information about the Toybox mailing list