[Toybox] Slow grep

Rob Landley rob at landley.net
Thu Sep 15 22:28:33 PDT 2022


On 9/15/22 16:32, enh wrote:
> On Thu, Sep 15, 2022 at 1:45 PM Rob Landley <rob at landley.net
> <mailto:rob at landley.net>> wrote:
> 
>     On 9/15/22 07:30, Yi-yo Chiang via Toybox wrote:
>     > grep is slow when the number of patterns is large.
>     ...
>     >   xxd -p -c 0 -l 40 /dev/urandom
> 
>     Huh, WHY does that produce two lines of output?
> 
>     $ xxd -p -c 0 -l 40 /dev/urandom
>     1fcf13e1b4844ba209fb9958bde26a13577c577744f1b1290240d03f4f8e
>     644fd0687c39b1aa8a68
> 
>     Behavior is consistent between toybox xxd and debian's, but Elliott sent in the
>     xxd implementation and I don't use it much, so... Feeding in -c 40 and -c 80
>     make no difference?
> 
> i think that's actually a bug caused by this:
> 
>   // Plain style is 30 bytes/line, no grouping.
> 
>   if (FLAG(p)) TT.c = TT.g = 30;
> 
> should presumably be 
>  
>   // Plain style is 30 bytes/line, no grouping.
>   if (FLAG(p)) {
>     if (!TT.c) TT.c = 30;
>     if (!TT.g) TT.g = 30;
>   }
>
> ?

Except we didn't set -g so it would still be set to 30, which is going to stick
spaces into the output.

And xxd_main() starts with if (!TT.c) TT.c = blah so it would never be zero at
that point unless we reorder the code, and then once THAT'S fixed -c 0 is still
!TT.c and if I switch that to if (FLAG(c)) to allow c = 0 through (which the
range in the optstr is allowing) it's used to cap the length in readall()
meaning the first read becomes EOF so no output gets produced.

> certainly "real" xxd works for me on macos and debian, both of which have the
> same version of xxd:
> 
> ~$ xxd --version
> xxd 2022-01-14 by Juergen Weigert et al.
> ~$ xxd -p -c 0 -l 40 /dev/urandom
> ac160632955aa9d938e60d3533cbcf0febb4decdd12f130e415913ff1fe6e2abcaf7c4a8e980de7a

See, this is extra weird: nothing set -g so it should default to 2. Somehow it
knows to set itself to... I'm guessing 0. Did -p -c 0 get special cased, or did
-p change its default to avoid any breaks even without the -c 0? (Sounds like
the latter is more likely, but I tried "yum install xxd" on my fedora 36 VM and
yum doesn't know what an xxd is.

> ah, but on another box with 2021-10-22 it's broken. so it looks like "real" xxd
> had the same bug and fixed it recently?

Eh, seems more like a design decision than a bug. Before -p was wordwrapping the
hexdump output and now it isn't. I dunno if it always isn't, or just with -c 0?
We didn't set -g and it has a nonzero default value (1, 2, or 4 depending on
barometric pressure)...

I also note that the man page says -g 0 switches off grouping, but does NOT say
that -c 0 switches off columns? In the V1.10 version I have installed, -c 0
seems to be a NOP:

$ sha1sum < /dev/null | xxd -c 0
00000000: 6461 3339 6133 6565 3565 3662 3462 3064  da39a3ee5e6b4b0d
00000010: 3332 3535 6266 6566 3935 3630 3138 3930  3255bfef95601890
00000020: 6166 6438 3037 3039 2020 2d0a            afd80709  -.

Once again: easy to change the behavior, hard to tell what the changed behavior
should be. Easiest is to have -p force -g to 0 and -c to huge (stomping whatever
else got set in both). I could also teach -c that 0 means infinite (well,
sizeof(toybuf) implementation limit which is still bigger than the 256 directly
settable limit that I have no idea why it's there) if that's actually a thing...?

(Grumble grumble no standard and the reference implementation has version skew...)

Rob


More information about the Toybox mailing list