[Toybox] Slow grep
    Rob Landley 
    rob at landley.net
       
    Mon Oct  3 03:11:58 PDT 2022
    
    
  
On 10/3/22 03:43, Yi-yo Chiang wrote:
> On Thu, Sep 29, 2022 at 1:20 PM Rob Landley <rob at landley.net
> <mailto:rob at landley.net>> wrote:
> 
>     All '^' and '$' do is say the zero length match occurs just once at the start or
>     end of every line, but the -o logic discards all zero length matches and the
>     non-o logic just cares that there IS a match. UNLESS you're producing colored
>     output, but the colored output mostly piggybacks on -o. There is a sort of
>     terrible corner case though, which oddly enough makes this corner case visible!
> 
>     $ echo potato | grep --color=always '' | hd
>     00000000  70 6f 74 61 74 6f 0a                              |potato.|
>     00000007
>     $ echo potato | toybox grep --color=always '' | hd
>     00000000  1b 5b 6d 1b 5b 31 3b 33  31 6d 1b 5b 6d 70 6f 74  |.[m.[1;31m.[mpot|
>     00000010  61 74 6f 0a                                       |ato.|
>     00000014
>     $ echo potato | toybox grep --color=always '$' | hd
>     00000000  1b 5b 6d 70 6f 74 61 74  6f 1b 5b 31 3b 33 31 6d  |.[mpotato.[1;31m|
>     00000010  1b 5b 6d 0a                                       |.[m.|
>     00000014
> 
>     Yeah, I have a todo item to try to optimize the color escape generation for
>     toybox but last time I sat down and looked at it I just didn't have the spoons.
> 
> I have some optimizations related to color output, not sure if these are what
> you had in mind?
NI was just talking about making it not emit unnecessary color escapes that do
nothing but cancel each other out.
> * If not showing color (no --color) or not showing matched part (-v) and (-o,
> -w, -x) are not given, then call regcomp() with REG_NOSUB. (could potentially
> improve the compiled regex)
> * If regex pattern was compiled with REG_NOSUB (we are not printing/highlighting
> matched part), then we can stop pattern matching as soon as we found first hit
> (no need to loop through all patterns to find longest match nor all matches)
> (break this loop early
> https://github.com/landley/toybox/blob/b17dc8e111dd408db04ea7ae70c410fd3054a751/toys/posix/grep.c#L218
*shrug* Sure, sounds like a good idea.
> I can send you these patches after I rebase my local tree on top of the recent
> bucket sort optimizations.
>  
>     > that a simplified case based on the real-world build break caused by
>     ...
>     > merging zero jar files into one instead of all the jar files :-)
> 
>     Acknowledged. Try commit 193009855266?
> 
> 
> Still had problems when matching Fixed patterns:
> 
> $ echo '\.zip' | ./old-toybox grep -F '\.zip'
> \.zip
> $ echo '\.zip' | ./toybox grep -F '\.zip'
> $
>
> I think what's happening is the fixed string pattern '\.zip' gets put into the
> '.' bucket (should be '\' bucket instead?), thus '\.zip' never matches anything. 
Sigh, you're right. Try commit b26689f95065
Rob
    
    
More information about the Toybox
mailing list