[Toybox] [PATCH] grep: add --line-buffered and fix regular buffering.

enh enh at google.com
Sun Feb 24 20:56:00 PST 2019


On Sun, Feb 24, 2019, 15:32 Rob Landley <rob at landley.net> wrote:

> On 2/24/19 4:27 PM, enh via Toybox wrote:
> > This is slightly more involved than just calling setlinebuf(3) because
> > the code was already implicitly flushing more than once per line thanks
> > to all the calls to the 'x' wrappers around stdio.
>
> Most of them can go, I tend to err on the side of error checking (which
> requires
> a flush to see if it worked) and then pull back later. (The xexit() code
> checks
> at the end, but you want to catch things like "disk full" or a broken pipe
> earlier rather than sitting there spinning...)
>

I think all of them should go. That would match GNU and BSD. Remember that
you'll get your disk full or broken pipe every time you fill the stdio
buffer *or* automatically after each line if you're outputting to a
terminal anyway. (And in the uncommon case you really do care, that's
exactly what --line-buffered is for.)

Optimizing for the exceptional case where there are large gaps in the
output *and* you don't want all the output anyway, at the expense of
performance in successful cases seems wrong.

> I've un-'x'ed putsl and putsn, given that the only caller doesn't want
> > to flush that often, but I left them in lib/ for now.
> >
> > As usual I haven't added a bogus short option to correspond to
> > --line-buffered because I fear that one day (when it's many years
> > of existing practice too late to fix either side) we'll end up with
> > a collision.
>
> Many moons ago I taught busybox mount to autodetect when it needed to do
> loopback mounts so you didn't need to say -o loop, and util-linux
> eventually
> caught up and did that too. They do sometimes accept sufficiently
> widespread
> external practice. :)
>

(Yeah, but having to deal with building on Linux and macOS, it's the cases
where two different implementations choose the same letter for two
different things that causes me the most pain. On a fairly regular basis,
which is why I'd like to switch the Mac over at some point. But Linux
first!)

But yeah, syntax for --longopt without short opt exists for a reason...
>
> > Tested by looking at `strace -e write` output when running
> > `grep fpu_ /proc/cpuinfo > /dev/null` with and without --line-buffered
> > (the redirection is necessary because stdout is line buffered by default
> > if it's a terminal).
> > ---
> >  lib/lib.c         | 17 +++++++++++++++++
> >  lib/lib.h         |  4 ++--
> >  lib/xwrap.c       | 19 -------------------
> >  toys/posix/grep.c | 31 ++++++++++++++++---------------
> >  4 files changed, 35 insertions(+), 36 deletions(-)
>
> The "x" prefix means "can error_exit()" (patient zero of which was
> xmalloc()),
> the flush was just necessary to do said checking reliably. Your new ones
> can
> still error_exit(), so technically they'd still be x without the flush.
>
> Of course this raises the "some flush and some don't spectre..." Hmmm.
> How about:
>
> diff --git a/lib/xwrap.c b/lib/xwrap.c
> index 12016a2..ea3e9f1 100644
> --- a/lib/xwrap.c
> +++ b/lib/xwrap.c
> @@ -143,7 +143,7 @@ char *xmprintf(char *format, ...)
>
>  void xflush(void)
>  {
> -  if (fflush(0) || ferror(stdout)) perror_exit("write");
> +  if ((!toys.noflush && fflush(0)) || ferror(stdout))
> perror_exit("write");
>  }
>
>  void xprintf(char *format, ...)
> diff --git a/toys.h b/toys.h
> index 9e49900..40eca86 100644
> --- a/toys.h
> +++ b/toys.h
> @@ -106,6 +106,7 @@ extern struct toy_context {
>    int signalfd;            // and writes signal to this fd, if set
>    char exitval;            // Value error_exit feeds to exit()
>    char wasroot;            // dropped setuid
> +  char noflush;            // tell xflush() not to (for performance).
>
>    // This is at the end so toy_init() doesn't zero it.
>    sigjmp_buf *rebound;     // siglongjmp here instead of exit when
> do_rebound
>
> If that looks like a reasonable approach, I can try to chop this patch
> down to
> what it actually does if you don't want to resubmit. (Note: xexit()
> already does
> the flush itself because error_msg() not error_exit()...)
>

To me this is an even worse idea than the current x* behavior. At least
I've learned that all the x* output functions cause an unnecessary flush
because we've fixed this over-zealous flushing in several places already.
(And those are just the ones we've caught.)

I think part of the reason we've had trouble with this is because it's
implicit. Adding *another* piece of magic seems like step in the wrong
direction, especially for something (early flushing) that seems like it's
usually a mistake anyway.

(This specific suggestion seems doubly dangerous because it *breaks* xflush
--- the one *explicit* way to cause a flush. I'm all in favor of xflush.
It's how we fixed things like top to reduce flicker, for example.)

Personally I'm not 100% convinced that putsn and putsnl pull their weight
at all... I'd happily use printf("%s", s) for the former given that neither
reader nor writer needs to think about what that means, and it composes
better with any neighboring printf or putchar or whatever. Likewise
"%*.*s", which is widely used and a generally useful thing for the reader
to learn. (But this is because I think implicit early flushes are a bug,
not a feature, so I need to persuade you of that first 😀 I wonder if the
numbers are amenable to going through and trying to work out which if any
are genuinely helpful...)

Rob
> _______________________________________________
> Toybox mailing list
> Toybox at lists.landley.net
> http://lists.landley.net/listinfo.cgi/toybox-landley.net
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.landley.net/pipermail/toybox-landley.net/attachments/20190224/14e437b2/attachment-0001.htm>


More information about the Toybox mailing list