[Toybox] [PATCH] count.c: Human readable -h option and MAYFORK

Rob Landley rob at landley.net
Thu Oct 19 11:14:46 PDT 2023


On 10/18/23 12:45, enh wrote:
>> > I've reproduced your test here, albeit 1.2Mb at 260Kb/s, but it's off by the
>> > same ratio.
>>
>> Interesting:
>>
>> $ timeout 1 yes | ./count -l >/dev/null
>> Terminated bytes, 2.9Gb, 560Mb/s, 0m01s
>> $ toybox timeout 1 yes | ./count -l >/dev/null
>> 1238049 bytes, 1.1Mb, 239Kb/s, 0m01s
>>
>> I.E. "timeout 1 yes | ..." does 3 gigabytes of output in the time toybox does
>> 1.1 megabytes, because toybox timeout is recursively calling toybox yes, which
>> commit 2c30d4f7a6a6 added line buffering to, so it's doing one write() call per
>> 4 bytes, and the debian version is filling up a much larger output buffer before
>> flushing it.

I redid yes to use writev() with toybuf as an iovec array of (256) repeated
mappings of the same output per syscall:

old toybox: 1.2M/s
new toybox: 104M/s
debian: 3.6G/s

Sigh. It's still limited by block size: the default "y" output is 2 bytes, *256
is still just 512 bytes/transaction. When I do "yes the days of the digital
watch are numbered" I get 3.5G/s, but the _default_ is slower...

Eh, two orders of magnitude faster than it was, probably good enough? (Sigh, the
linux limit on iovec repetition is 1024, so I _could_ malloc a larger buffer. Or
I could just have the common case string be y\ny\ny\n repeated a few times... Ok
I made it 128 bytes and I'm STOPPING NOW.)

>> I preferred to let libc figure out what sane thing to do with FILE * buffering,
> 
> you _say_ that, but that's not what you _do_ --- you always tell libc
> either "line buffering" or "no buffering":
> ```
> setvbuf(stdout, 0, (which->flags & TOYFLAG_LINEBUF) ? _IOLBF : _IONBF, 0);
> ```

I just want "less" not to update a screen at a time when a command produces
output slowly.

> if you had
> ```
> if (which->flags & TOYFLAG_LINEBUF) setvbuf(stdout, 0, _IOLBF, 0);
> ```
> we'd argue a lot less about buffering :-)

The existing LINEBUF users are ascii, base64, base32, yes, echo, grep, egrep, fgrep.

Both ascii and echo pretty much want a single big output buffer, they do not
produce progressive output so it might as well all be collated into one big
output buffer and then flushed.

I just rewrote yes not to use stdio at all, it does writev() to fd 1.

In theory base64 or base32 _might_ want progressive output, but it's not a
common case? And debian gets this very wrong:

$ while true; do echo hello; sleep .1; done | base32 | toybox time head -c 1
N
real	523.443
user	0.1534
sys	0.000

That's like 9 minutes before it sees ANYTHING. An "optimization" that very much
pessimizes certain data flows, and which smells to me like a hang waiting to
happen in some script...

And of course grep is the one we put this infrastructure in for in the first
place, and argued about at some length, and where I first went "this wants
nagle", which userspace sucks at because timers are finite there and our
callbacks are signals: expensive and full of side effects.

> i also wonder whether
> ```
> if (isatty() || (which->flags & TOYFLAG_LINEBUF)) setvbuf(stdout, 0, _IOLBF, 0);
> ```
> would make both of us happier?

The failure mode I keep seeing in various places is "thingy | less" where you
wait 30 seconds for anything to show up, then it's a full screen of output at
once. And unfortunately, stdout isn't a tty in that case.

>> and also mentioned that what I REALLY want is a libc version of the nagle
>> algorithm here where it collates writes occuring close enough together but
>> flushes the output when quiescent for long enough that humans would notice
>> "less" output not updating... but alas I glibc is insane, musl is simplistic,
>> and I doubt bionic even tried here, so Elliott added micromanagement to yes,
>> which his comment says it was even more wrong before.
> 
> if you were on WG14, you could suggest adding that kind of buffering :-)

Oh I could implement it myself, I think I've talked about it here before? The
problem is you need either a timer callback or a child process ("thread" if you
prefer) which is a solution worse than the problem.

What we _really_ need is a cheaper way to feed data to the kernel so the syscall
overhead of small outputs doesn't crater our throughput by an order of
magnitude. The solution probably involves vdso trickery, where a write() to the
same fd as last time can copy data to a ~64 byte ring buffer (heck you can go up
to 256 bytes and still have single byte head/tail indicators) and use the
syscall if the next chunk of data won't fit in the buffer, and then the kernel
can pick it up and consume it whenever the scheduler timeslice expires (with the
syscall automatically flushing anything already in the buffer ala iovec).

> as it is, it's a bit hard for any libc to say "i know what you _told_
> me, but i'm going to do my own unspecified thing that there's no way
> to explicitly ask for".

The problem is the interface between the userspace and the kernel is too
expensive, so userspace has to do buffering that's hard for userspace to get
right. No matter WHAT userspace does here, it's jumping through hoops.

Some variant of a vdso fix seems better. (Is it worth trying to avoid taking a
soft fault the first time you write to that vdso write() ring buffer? I dunno
how expensive that is or if it's easily avoidable? There are also potential
O_NONBLOCK issues, this mostly seems like a pipe-specific optimization? Needs
more thought from a kernel person. Pipes never ENOSPC and can discard output
after you write it because the consumer closed, most other file descriptors
can't do that. That said, "output writes to pipeline" is a common case...)

> also: rob landley asking for libc to spawn a stdio stream flushing thread?! :-P

No thank you. Adding threads to non-threaded programs is Jason Mendoza's "Any
time I had a problem I threw a molotov cocktail, and boom right away, I had a
different problem" from The Good Place.

And I _just_ updated from C99 to C11. Nothing the C committee does today is
likely to be relevant to me a predictable timeframe.

Rob


More information about the Toybox mailing list