[Toybox] [PATCH] grep: add --line-buffered and fix regular buffering.

Fri Apr 26 12:54:38 PDT 2019

On 4/25/19 8:05 PM, enh wrote:
> having written a parallel grep in Java almost 20 years ago, mmap isn't

I wrote a glob implementation for OS/2's installer in 1996. Been there. :)

> an obvious win. it's very dependent on what you're searching (number
> and size of files), and to some extent what you're searching for. iirc

Linus had a rant on this over a decade ago. He came down on the side of doing
page size read/write in a loop, if I recall. (Which you'll notice my code doing
a lot: toybuf and libbbuf are one page each for a reason. I should probably do
magic to make sure they're page aligned, but see "wanted to hit 1.0 before
opening performance optimization can of worms").

It's likely the optimal granularity's changed since, though. Disk sector sizes
went from 512 to 4096, for one thing, this may even have been shortly before
sata disk I/O became the norm. (And before usb3.)

This is also why I want to do the one big mmap to avoid mremap(), and I'm also
not sure MADV_SEQUENTIAL is a win. Need to benchmark it. (Especially with long
lines and multiple regexes on a system doing something else where lines cross a
page boundary. Even just putting the page on the clean list and soft-faulting it
back in is expensive...)

For the smaller ones, yes reading into a buffer is preferable. But what I
_really_ want is "start DMA-ing data into this userspace buffer but don't block
yet" and then later a "block until outstanding reads finish" so I can overlap
processing and I/O. The nonblocking read and direct access APIs both do the
opposite of that. I'd hoped to find a preadv2() flag for this, but no. (And
where would the corresponding wait() interface live? An ioctl?)

I _can_ get that from mmap() with MADV_WILLNEED, assuming the mapping overhead
doesn't dominate. Possibly there's a minimum file length at which it's worth it,
and it should fall back to treating it as a pipe below that?

This is why I didn't want to open this can or worms yet. It HAS NO BOTTOM.

> https://blog.burntsushi.net/ripgrep/ talks about this with much more
> modern data, and they did actually go with a heuristic to guess when
> they should/shouldn't use mmap. (glibc's stdio will use mmap in some
> circumstances. i don't know of any other libc that does.)

My phone battery died (the new laptop doesn't recognize it as a device, thus
doesn't up the USB port's power high enough to charge it; I didn't know that LED
could flash red).

> what NetBSD grep (which is what we're coming from) _does_ spend a lot
> of code on is avoiding regexec(3). they recognize easy cases (like
> "you could have used -F") and handle them more cheaply. in an ideal
> world, that would be in the regex implementation, but never having
> tried i don't actually know how realistic that is. (if i had the time,
> i'd love to compare against RE2.)

Whatever tests netbsd is doing on the string it's not passing to regcomp could
also be done in regcomp. 95% likely there.

The question is why they didn't in _their_ libc. Haven't looked at it.

(For licensing reasons I mostly don't look at other implementations' code when
doing a toybox implementation of something. I try hard to clean-room it from
RFCs and man pages and wikipedia and strace and so on. I spent a decade in the
intellectual property mines and now have a certain "hazmat suit" mentality about
the lot of it. NetBSD isn't the kind of full-on bugnuts crazy the FSF and SFC
evince, but I'm not reintroducing the license stuttering problem to my code either.)

That said if somebody _else_ wanted to look and tell me, that's how "clean room"
works. :)

>>> I might even be able to get it to mmap() and mremap() the data traversing down a
>>> file without too intrusive a second path. (Does madvise(MADV_WILLNEED) persist
>>> across mremaps of the same mapping, or do I need to call it again?)

I note that my short-term todo pile is starting to teeter and what I should
_really_ do is clean up the new man implementation and find the gzip second file
bug and finish deflate and the new mkfs.vfat and FINALLY GET STARTED ON THE
TOYSH REDO...

Rob

P.S. Yes the new tryfile() function I added to man.c thrashes the negative
dentry cache, but if _man_ winds up being performance critical we shouldn't have
one. And I couldn't test the original on my system because A) the supplied test
wants to write into /usr/share/man which ain't happening, B) it doesn't
understand the ".gz" entry and all the devuan man pages are gzipped because it's
the 80/20 of archivers.