[Toybox] Slow grep

Mon Sep 26 13:44:47 PDT 2022

On Sat, Sep 24, 2022 at 2:31 PM Rob Landley <rob at landley.net> wrote:

> On 9/15/22 20:42, Rob Landley wrote:
> > On 9/15/22 16:32, enh wrote:
> >> ah, but on another box with 2021-10-22 it's broken. so it looks like
> "real" xxd
> >> had the same bug and fixed it recently?
> >
> > Easy enough to fix, I was just confused by "the one in debian fails the
> same way"...
> >
> >>     > A simple optimization would be to concat all patterns into one
> huge regex
> >>     > pattern with '|'.
> >>
> >>     Which you could do in your pattern creation just as easily as grep
> could do so
> >>     internally? Except when I tried it:
> >>
> >>     $ time toybox grep -Ef <(tr '\n' '|' < test_large_10000 | head -c
> -1) \
> >>       test_large_10000
> >>
> >>     The result was even _slower_ on glibc (and gave me "out of memory"
> on musl).
> >>
> >> fwiw, BSD grep which we used to use before
> >> had
> https://android.googlesource.com/platform/system/core/+/kitkat-release/toolbox/grep/fastgrep.c#32
> >> which (although i haven't tested) i always assumed was for this.
> >
> > Yeah, doing more ourselves is the obvious way to go...
>
> I punted on most of the utf8 stuff in there. Let me know if I should
> revisit
> that. (Right now I'm treating ANY string has char > 127 in it as "hand off
> to
> the regex engine" instead of using the fast path -F logic. In theory I
> only need
> to do that for -i but see recent email about reordering width zero unicode
> code
> points... if the regex engine does it, it's not my fault when it's wrong.
> :)
>

both users i've seen have wanted this for "filenames in a build", so they
should be fine. (and, yes, "i18n is someone else's problem" is usually the
right choice down at this level.)

> >>     I could also sort the patterns by first character and loop through
> calling the
> >>     appropriate regexes based on character at string position... which
> is harder
> >>     than you think because there could be a | later in the pattern so
> that
> >>     optimization only applies to a _subset_ of patterns. (First char
> being . or * or
> >>     ( kinda throws that off to, and that optimization would also have
> to be aware of
> >>     case insensitivity...)
> >
> > I was trying to avoid writing another regex implementation, but a fast
> subset
> > one that does parallel cache-local searching for simple patterns and
> hands off
> > to libc for anything complicated isn't that hard. And I can even cheat
> with a
> > bitmask for starting byte:
>
> What I did was simpler: filter the incoming TT.e list (after it's been
> split at
> \n and TT.f entries collated into it, which the code was already doing)
> into
> fixed and regex categories (ala autodetect when I can apply -F logic, with
> basic
> support for non-unicode case insensitivity, backslash escapes, and "." not
> in
> the first position), and then bucket sort the resulting -F list into an
> array of
> 256 linked lists by starting character (punting for now that starting with
> "."
> means it autodetects as regex not fixed: the actual -F flag overrides
> this), and
> then use TT.fixed[c] being NULL to mean nothing can match here. Further
> optimization: sorting each per-character list by length in the setup phase
> means
> the actual search logic can stop at the first hit because it's the longest
> one.
>
> No bitmask needed: https://github.com/landley/toybox/commit/a7e49c3c7860
>
> (Then it falls back to running the same old regex logic for any non-fixed
> patterns, where regexec searches the whole string from start to finish for
> each
> pattern. If there aren't any to loop naturally skips itself traversing zero
> entries, if there are a small number the performance hit shouldn't be too
> bad.)
>
> It's still slower than the gnu/dammit grep (0.9 seconds vs 0.3 seconds on
> my
> laptop), but hopefully good enough.
>

heh, yeah, that sounds like plenty :-) i don't know of anyone who's
actually timing this --- the complaint was just about the pathological
behavior!

> Try now?
>

(update in progress. given that yochiang and i are in horribly incompatible
timezones, we might be a while getting back to you...)

> Rob
>
> P.S. I added a speed test to grep.test (5 second timeout, should still
> pass on
> the original raspberry pi model A from 2012 not that I have one), using
> base64
> on a long seq to get the many unique test lines instead of xxd
> /dev/urandom,
> because urandom is slow producing nontrivial amounts of data. Trying to
> feed seq
> into sha3sum or similar was equally slow: running the test should not be an
> order of magnitude faster than generating the test data.
>

certainly fine on my rpi400...

 PASS: grep speed

specifically it takes:

real 0m0.609s
user 0m0.581s
sys 0m0.177s

that seems like a good enough margin of error for slower rpis :-)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.landley.net/pipermail/toybox-landley.net/attachments/20220926/bda58a2d/attachment.htm>