[Toybox] [PATCH] grep: add --line-buffered and fix regular buffering.

enh enh at google.com
Mon Feb 25 09:29:19 PST 2019


On Mon, Feb 25, 2019 at 8:46 AM Rob Landley <rob at landley.net> wrote:

> On 2/24/19 10:56 PM, enh wrote:
> >
> >
> > On Sun, Feb 24, 2019, 15:32 Rob Landley <rob at landley.net
> > <mailto:rob at landley.net>> wrote:
> >
> >     On 2/24/19 4:27 PM, enh via Toybox wrote:
> >     > This is slightly more involved than just calling setlinebuf(3)
> because
> >     > the code was already implicitly flushing more than once per line
> thanks
> >     > to all the calls to the 'x' wrappers around stdio.
> >
> >     Most of them can go, I tend to err on the side of error checking
> (which requires
> >     a flush to see if it worked) and then pull back later. (The xexit()
> code checks
> >     at the end, but you want to catch things like "disk full" or a
> broken pipe
> >     earlier rather than sitting there spinning...)
> >
> >
> > I think all of them should go. That would match GNU and BSD. Remember
> that
> > you'll get your disk full or broken pipe every time you fill the stdio
> buffer
> > *or* automatically after each line if you're outputting to a terminal
> anyway.
> > (And in the uncommon case you really do care, that's exactly what
> > --line-buffered is for.)
> >
> > Optimizing for the exceptional case where there are large gaps in the
> output
> > *and* you don't want all the output anyway, at the expense of
> performance in
> > successful cases seems wrong.
>
> Hmmm... You have a point.
>
> > To me this is an even worse idea than the current x* behavior. At least
> I've
> > learned that all the x* output functions cause an unnecessary flush
> because
> > we've fixed this over-zealous flushing in several places already. (And
> those are
> > just the ones we've caught.)
>
> I've been moving xprintf() to printf() in several places myself.
>
> I don't want a toybox command to produce endless output without checking
> for
> error ("sed blah bigfile.img | head" shouldn't run a long time, for
> example).
>

true, but sed is one of those cases where there is regular _explicit_
flushing and GNU and BSD seem to agree.

what i want to avoid is that if we do 's/a/A/g' on `abababababababab\n` we
flush nine times rather than one :-)

since i'm trying to push us down this "flushes should be explicit" path,
i'm happy to be the one on the spot for fixing any missing places :-)


> And debian keeps breaking "tee" so:
>
>   (while true; do echo -n .; sleep 1; done) | tee /dev/null
>
> Won't produce output for 4k and then suddenly produces multiple screens
> full.
> (They fix it, they break it, they fix it again...)
>
> > I think part of the reason we've had trouble with this is because it's
> implicit.
> > Adding *another* piece of magic seems like step in the wrong direction,
> > especially for something (early flushing) that seems like it's usually a
> mistake
> > anyway.
>
> I think what I usually want to do is check for errors but not flush. (There
> isn't a standard write() buffer size, an officially policy for _when_ it
> gets
> flushed out, and nothing like a timeout, which is why I was flushing it in
> the
> first place, but that turned out to be worse...)
>
> > (This specific suggestion seems doubly dangerous because it *breaks*
> xflush ---
> > the one *explicit* way to cause a flush. I'm all in favor of xflush.
> It's how we
> > fixed things like top to reduce flicker, for example.)
>
> To be honest I was calling xflush() from so many places because it was an
> easy
> way to get the if (ferror) perror_exit("write"); Possibly what I want is an
> xferror() that xflush() can call...
>

where would you want to call xferror that xflush wouldn't be appropriate?
(fread is the main place one needs ferror normally, and toybox has
remarkably few calls to fread, neither of which seem to actually need to
care.)


> > Personally I'm not 100% convinced that putsn and putsnl pull their
> weight at
> > all...
>
> Neither am I, but it's annoying there _isn't_ one in the standard library.
> (And
> puts() adding a newline but fputs() _not_ doing so is just irritating.)
>

that and the fact that they couldn't decide whether the FILE* came first
(in most cases) or last (in a couple of annoying exceptions) annoy me far
more than they should. but they mean that even 30 years after meeting them,
i still need to check the man page to use them. (or, more usually, just
type whatever and let the compiler correct me.)


> That said, you can fwrite() a gigabyte piped through cat and it all goes
> out in
> one gulp even with glibc. (Yes, I wrote a test program.) This is why I
> have a
> writeall() for filehandles but not an fwriteall(): because I _think_ libc
> can be
> trusted to retry short writes? (Something's doing it. When I worry about
> data
> integrity I do it myself with filehandles and writeall() anyway, although
> get_line() in lib.c is a pre-posix-2008 relic I need to clean out. It's on
> the
> todo list...)
>
> And trusting fwrite() is only for squishy values of "trust" because posix
> won't
> come out and SAY it retries short writes. In fact posix says that being
> interrupted by a signal makes it return a short write which I believe is
> _not_
> what Linux does with things like SIGSTOP/CONT and SIGCHLD, those the OS
> will
> retry internally, and yes I know this because I broke it:
>
>   http://lkml.iu.edu/hypermail/linux/kernel/0503.3/1756.html
>
> And then the implementation changed (splice went in circa
> https://lwn.net/Articles/178199/ and the pipe plumbing got a complete
> rewrite)
> which had subtle behavior diferences... *shrug* The usual.
>
> (If you're wondering why I get a bit paranoid about this stuff in places,
> it's
> scar tissue and old trauma. I have another todo item to investigate the
> sigaction(SA_RESTART) behavior's impact on this stuff and what the
> defaults can
> be trusted to _be_ these days, but again: todo list runneth over and this
> ain't
> my $DAYJOB...)
>

i've gone a step further and admitted default. do what you like, the flash
is lying to the kernel anyway, and it's all down to trust in the end. do
you really believe that the blocks are at least in some kind of
non-volatile storage and won't get lost between where they are now and
where they're supposed to end up? i think the best answer is only "mostly".


> > I'd happily use printf("%s", s) for the former given that neither reader
> > nor writer needs to think about what that means, and it composes better
> with any
> > neighboring printf or putchar or whatever.
>
> Except too many times it's _not_ composed...
>
> $ grep '("%s",' toys/*/*.c | wc
>      52     252    3250
>

aye, but that's fixable.

> Likewise "%*.*s", which is widely
> > used and a generally useful thing for the reader to learn. (But this is
> because
> > I think implicit early flushes are a bug, not a feature, so I need to
> persuade
> > you of that first 😀 I wonder if the numbers are amenable to going
> through and
> > trying to work out which if any are genuinely helpful...)
>
> Sigh, we should probably make the helpful ones explicit and remove the
> rest.
>

sgtm.

looking at all the callers, xputs isn't used much at all (21 calls in
toys/) and most of them seem dubious. xprintf is a lot more popular (338),
but -- though it's harder to tell because there are so many -- nothing
particularly convincingly in need of a flush stood out.

i'm assuming this will be like FLAG, where we'll do it as we're touching
things for other reasons?


> I'll add a cleaup pass to the todo heap...
>

(you should probably keep that list checked in, or even stored as issues in
the github bug tracker.)


> Rob
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.landley.net/pipermail/toybox-landley.net/attachments/20190225/643ad37e/attachment-0001.htm>


More information about the Toybox mailing list