[Toybox] Slow grep

Rob Landley rob at landley.net
Wed Sep 28 22:29:28 PDT 2022


On 9/28/22 14:54, enh wrote:
> heh... funny you should mention "." in the first position...
> 
> $ echo "foo.jar" | old-toybox grep '\.jar'
> foo.jar
> $ echo "foo.jar" | new-toybox grep '\.jar'
>
Technically that's '\' in the first position, which is the problem.

(That's both parsing wrong, and I had an optimization where it skips the first
character checking fast pattern matches because obviously we already matched
that to find the right bucket with the patterns for this character... EXCEPT if
that first character is escaped then skipping one character of each side of the
comparison will put the pattern traversal out of sync.)

Meanwhile, a pattern '^$' should match ONLY empty lines, and the easiest way to
handle that is just special case that one to the regex path. The fast path can
still handle '^' and '$' which match every line: zero length pattern does a zero
length match at each position, which -o has plumbing to filter out because *
does that too since it's zero-or-more instances so a* is going to match for
length zero at every place that does NOT have an 'a', ala:

  $ echo '' | grep 'z*' | wc -l
  1

All '^' and '$' do is say the zero length match occurs just once at the start or
end of every line, but the -o logic discards all zero length matches and the
non-o logic just cares that there IS a match. UNLESS you're producing colored
output, but the colored output mostly piggybacks on -o. There is a sort of
terrible corner case though, which oddly enough makes this corner case visible!

$ echo potato | grep --color=always '' | hd
00000000  70 6f 74 61 74 6f 0a                              |potato.|
00000007
$ echo potato | toybox grep --color=always '' | hd
00000000  1b 5b 6d 1b 5b 31 3b 33  31 6d 1b 5b 6d 70 6f 74  |.[m.[1;31m.[mpot|
00000010  61 74 6f 0a                                       |ato.|
00000014
$ echo potato | toybox grep --color=always '$' | hd
00000000  1b 5b 6d 70 6f 74 61 74  6f 1b 5b 31 3b 33 31 6d  |.[mpotato.[1;31m|
00000010  1b 5b 6d 0a                                       |.[m.|
00000014

Yeah, I have a todo item to try to optimize the color escape generation for
toybox but last time I sat down and looked at it I just didn't have the spoons.

> that a simplified case based on the real-world build break caused by
...
> merging zero jar files into one instead of all the jar files :-)

Acknowledged. Try commit 193009855266?

Also, I checked in the start of the sed/tar xform protocol stuff. It's not
handling flags yet, but it's passing the data so it _can_ do so. (I didn't want
to send you a tar/sed pair that had to be upgraded in tandem and then changed
AGAIN later. Hopefully this is the usable protocol version...)

Rob


More information about the Toybox mailing list