[Toybox] Slow grep
Rob Landley
rob at landley.net
Mon Oct 3 03:11:58 PDT 2022
On 10/3/22 03:43, Yi-yo Chiang wrote:
> On Thu, Sep 29, 2022 at 1:20 PM Rob Landley <rob at landley.net
> <mailto:rob at landley.net>> wrote:
>
> All '^' and '$' do is say the zero length match occurs just once at the start or
> end of every line, but the -o logic discards all zero length matches and the
> non-o logic just cares that there IS a match. UNLESS you're producing colored
> output, but the colored output mostly piggybacks on -o. There is a sort of
> terrible corner case though, which oddly enough makes this corner case visible!
>
> $ echo potato | grep --color=always '' | hd
> 00000000 70 6f 74 61 74 6f 0a |potato.|
> 00000007
> $ echo potato | toybox grep --color=always '' | hd
> 00000000 1b 5b 6d 1b 5b 31 3b 33 31 6d 1b 5b 6d 70 6f 74 |.[m.[1;31m.[mpot|
> 00000010 61 74 6f 0a |ato.|
> 00000014
> $ echo potato | toybox grep --color=always '$' | hd
> 00000000 1b 5b 6d 70 6f 74 61 74 6f 1b 5b 31 3b 33 31 6d |.[mpotato.[1;31m|
> 00000010 1b 5b 6d 0a |.[m.|
> 00000014
>
> Yeah, I have a todo item to try to optimize the color escape generation for
> toybox but last time I sat down and looked at it I just didn't have the spoons.
>
> I have some optimizations related to color output, not sure if these are what
> you had in mind?
NI was just talking about making it not emit unnecessary color escapes that do
nothing but cancel each other out.
> * If not showing color (no --color) or not showing matched part (-v) and (-o,
> -w, -x) are not given, then call regcomp() with REG_NOSUB. (could potentially
> improve the compiled regex)
> * If regex pattern was compiled with REG_NOSUB (we are not printing/highlighting
> matched part), then we can stop pattern matching as soon as we found first hit
> (no need to loop through all patterns to find longest match nor all matches)
> (break this loop early
> https://github.com/landley/toybox/blob/b17dc8e111dd408db04ea7ae70c410fd3054a751/toys/posix/grep.c#L218
*shrug* Sure, sounds like a good idea.
> I can send you these patches after I rebase my local tree on top of the recent
> bucket sort optimizations.
>
> > that a simplified case based on the real-world build break caused by
> ...
> > merging zero jar files into one instead of all the jar files :-)
>
> Acknowledged. Try commit 193009855266?
>
>
> Still had problems when matching Fixed patterns:
>
> $ echo '\.zip' | ./old-toybox grep -F '\.zip'
> \.zip
> $ echo '\.zip' | ./toybox grep -F '\.zip'
> $
>
> I think what's happening is the fixed string pattern '\.zip' gets put into the
> '.' bucket (should be '\' bucket instead?), thus '\.zip' never matches anything.
Sigh, you're right. Try commit b26689f95065
Rob
More information about the Toybox
mailing list