[Toybox] Slow grep
Rob Landley
rob at landley.net
Thu Sep 15 22:28:33 PDT 2022
On 9/15/22 16:32, enh wrote:
> On Thu, Sep 15, 2022 at 1:45 PM Rob Landley <rob at landley.net
> <mailto:rob at landley.net>> wrote:
>
> On 9/15/22 07:30, Yi-yo Chiang via Toybox wrote:
> > grep is slow when the number of patterns is large.
> ...
> > xxd -p -c 0 -l 40 /dev/urandom
>
> Huh, WHY does that produce two lines of output?
>
> $ xxd -p -c 0 -l 40 /dev/urandom
> 1fcf13e1b4844ba209fb9958bde26a13577c577744f1b1290240d03f4f8e
> 644fd0687c39b1aa8a68
>
> Behavior is consistent between toybox xxd and debian's, but Elliott sent in the
> xxd implementation and I don't use it much, so... Feeding in -c 40 and -c 80
> make no difference?
>
> i think that's actually a bug caused by this:
>
> // Plain style is 30 bytes/line, no grouping.
>
> if (FLAG(p)) TT.c = TT.g = 30;
>
> should presumably be
>
> // Plain style is 30 bytes/line, no grouping.
> if (FLAG(p)) {
> if (!TT.c) TT.c = 30;
> if (!TT.g) TT.g = 30;
> }
>
> ?
Except we didn't set -g so it would still be set to 30, which is going to stick
spaces into the output.
And xxd_main() starts with if (!TT.c) TT.c = blah so it would never be zero at
that point unless we reorder the code, and then once THAT'S fixed -c 0 is still
!TT.c and if I switch that to if (FLAG(c)) to allow c = 0 through (which the
range in the optstr is allowing) it's used to cap the length in readall()
meaning the first read becomes EOF so no output gets produced.
> certainly "real" xxd works for me on macos and debian, both of which have the
> same version of xxd:
>
> ~$ xxd --version
> xxd 2022-01-14 by Juergen Weigert et al.
> ~$ xxd -p -c 0 -l 40 /dev/urandom
> ac160632955aa9d938e60d3533cbcf0febb4decdd12f130e415913ff1fe6e2abcaf7c4a8e980de7a
See, this is extra weird: nothing set -g so it should default to 2. Somehow it
knows to set itself to... I'm guessing 0. Did -p -c 0 get special cased, or did
-p change its default to avoid any breaks even without the -c 0? (Sounds like
the latter is more likely, but I tried "yum install xxd" on my fedora 36 VM and
yum doesn't know what an xxd is.
> ah, but on another box with 2021-10-22 it's broken. so it looks like "real" xxd
> had the same bug and fixed it recently?
Eh, seems more like a design decision than a bug. Before -p was wordwrapping the
hexdump output and now it isn't. I dunno if it always isn't, or just with -c 0?
We didn't set -g and it has a nonzero default value (1, 2, or 4 depending on
barometric pressure)...
I also note that the man page says -g 0 switches off grouping, but does NOT say
that -c 0 switches off columns? In the V1.10 version I have installed, -c 0
seems to be a NOP:
$ sha1sum < /dev/null | xxd -c 0
00000000: 6461 3339 6133 6565 3565 3662 3462 3064 da39a3ee5e6b4b0d
00000010: 3332 3535 6266 6566 3935 3630 3138 3930 3255bfef95601890
00000020: 6166 6438 3037 3039 2020 2d0a afd80709 -.
Once again: easy to change the behavior, hard to tell what the changed behavior
should be. Easiest is to have -p force -g to 0 and -c to huge (stomping whatever
else got set in both). I could also teach -c that 0 means infinite (well,
sizeof(toybuf) implementation limit which is still bigger than the 256 directly
settable limit that I have no idea why it's there) if that's actually a thing...?
(Grumble grumble no standard and the reference implementation has version skew...)
Rob
More information about the Toybox
mailing list