[Toybox] [PATCH] grep: add --line-buffered and fix regular buffering.

Rob Landley rob at landley.net
Thu Apr 25 15:03:22 PDT 2019


On 4/24/19 5:50 PM, enh wrote:
> On Wed, Apr 24, 2019 at 3:40 PM Rob Landley <rob at landley.net> wrote:
>>
>> On 4/23/19 8:06 PM, enh wrote:
>>> (sorry. busy.)
>>
>> Sigh. I'll add a config symbol you can switch on, with help text complaining
>> about it.
> 
> well, it's not necessarily grep i'm worried about. i think if all the
> other toys are now going to be line-buffered, it doesn't make much
> sense to single out grep like this.

Neither do I.

That said, after writing up a point by point rebuttal to your message, I
benchmarked the code (silly to argue about something we haven't tested the speed
of) and found out that devuan's grep takes 8 seconds to do something mine takes
20 to do _before_ the output buffering changed. Which is unacceptable and made
the argument we were having kinda moot.

So now I'm staring at do_lines() and seeing if I can get it to do large (at
least 64k) block reads where it doesn't memcpy() the data out into individual
lines but passes a pointer into the buffer where possible. (And this lets me
replace get_line(fd) which has been on the todo list forever, although moving
the all to do_lines() and a callback is probably the best way...)

I might even be able to get it to mmap() and mremap() the data traversing down a
file without too intrusive a second path. (Does madvise(MADV_WILLNEED) persist
across mremaps of the same mapping, or do I need to call it again?)

I _also_ have a todo item WAY down my list that if toysh has a "find | grep"
pipeline and both are toybox commands it can automatically -print0 -0 on it, but
I'm not sure the proper semantics for triggering that. (Or saying it _shouldn't_
do that, which is why it's way down on the list...)

>> Does the attached work/suffice? (I _think_ setlinebuf(stdout) followed by
>> setvbuf() something else is legal as long as we haven't output anything to the
>> stream yet? But the man page is unclear. Presumably the admonition is because
>> pending unflushed data in the buffer would get discarded, but they don't _say_...)
> 
> (POSIX explicitly says that it's only valid before any output, but
> bionic [and BSDs] flush output data and discard input data if you call
> setvbuf [and the others are implemented in terms of setvbuf].)

musl doesn't appear bothered either. I read that as saying we can do it twice
before any output, but would be amazed if glibc ever tested that, yet care less
and less about breaking anything with "gnu" on it each year. It comes pre-broken
by the gnu all over it.

Rob



More information about the Toybox mailing list