[Toybox] [PATCH] grep: add --line-buffered and fix regular buffering.

enh enh at google.com
Thu Apr 25 15:23:43 PDT 2019


On Thu, Apr 25, 2019 at 3:03 PM Rob Landley <rob at landley.net> wrote:
>
> On 4/24/19 5:50 PM, enh wrote:
> > On Wed, Apr 24, 2019 at 3:40 PM Rob Landley <rob at landley.net> wrote:
> >>
> >> On 4/23/19 8:06 PM, enh wrote:
> >>> (sorry. busy.)
> >>
> >> Sigh. I'll add a config symbol you can switch on, with help text complaining
> >> about it.
> >
> > well, it's not necessarily grep i'm worried about. i think if all the
> > other toys are now going to be line-buffered, it doesn't make much
> > sense to single out grep like this.
>
> Neither do I.
>
> That said, after writing up a point by point rebuttal to your message, I
> benchmarked the code (silly to argue about something we haven't tested the speed
> of) and found out that devuan's grep takes 8 seconds to do something mine takes
> 20 to do _before_ the output buffering changed. Which is unacceptable and made
> the argument we were having kinda moot.

(one thing i was going to say is that line buffering has a few weird
side-effects. C11's 7.21.3.3 implies that you have to flush
line-buffered streams any time you read from an unbuffered or
line-buffered stream, for example. which would affect your original
code.)

> So now I'm staring at do_lines() and seeing if I can get it to do large (at
> least 64k) block reads where it doesn't memcpy() the data out into individual
> lines but passes a pointer into the buffer where possible. (And this lets me
> replace get_line(fd) which has been on the todo list forever, although moving
> the all to do_lines() and a callback is probably the best way...)

/me shudders having just fixed a bug in NetBSD's code to do this. but,
yeah, it's probably worth the cost. and we have ASan to keep us
honest.

> I might even be able to get it to mmap() and mremap() the data traversing down a
> file without too intrusive a second path. (Does madvise(MADV_WILLNEED) persist
> across mremaps of the same mapping, or do I need to call it again?)
>
> I _also_ have a todo item WAY down my list that if toysh has a "find | grep"
> pipeline and both are toybox commands it can automatically -print0 -0 on it, but
> I'm not sure the proper semantics for triggering that. (Or saying it _shouldn't_
> do that, which is why it's way down on the list...)
>
> >> Does the attached work/suffice? (I _think_ setlinebuf(stdout) followed by
> >> setvbuf() something else is legal as long as we haven't output anything to the
> >> stream yet? But the man page is unclear. Presumably the admonition is because
> >> pending unflushed data in the buffer would get discarded, but they don't _say_...)
> >
> > (POSIX explicitly says that it's only valid before any output, but
> > bionic [and BSDs] flush output data and discard input data if you call
> > setvbuf [and the others are implemented in terms of setvbuf].)
>
> musl doesn't appear bothered either. I read that as saying we can do it twice
> before any output, but would be amazed if glibc ever tested that, yet care less
> and less about breaking anything with "gnu" on it each year. It comes pre-broken
> by the gnu all over it.
>
> Rob



More information about the Toybox mailing list