[Toybox] Slow grep

Yi-yo Chiang yochiang at google.com
Fri Sep 16 07:47:03 PDT 2022


On Fri, Sep 16, 2022 at 1:20 PM Rob Landley <rob at landley.net> wrote:

> On 9/15/22 16:32, enh wrote:
> > On Thu, Sep 15, 2022 at 1:45 PM Rob Landley <rob at landley.net
> > <mailto:rob at landley.net>> wrote:
> >
> >     On 9/15/22 07:30, Yi-yo Chiang via Toybox wrote:
> >     > grep is slow when the number of patterns is large.
> >     ...
> >     >   xxd -p -c 0 -l 40 /dev/urandom
> >
> >     Huh, WHY does that produce two lines of output?
> >
> >     $ xxd -p -c 0 -l 40 /dev/urandom
> >     1fcf13e1b4844ba209fb9958bde26a13577c577744f1b1290240d03f4f8e
> >     644fd0687c39b1aa8a68
> >
> >     Behavior is consistent between toybox xxd and debian's, but Elliott
> sent in the
> >     xxd implementation and I don't use it much, so... Feeding in -c 40
> and -c 80
> >     make no difference?
> >
> > i think that's actually a bug caused by this:
> >
> >   // Plain style is 30 bytes/line, no grouping.
> >
> >   if (FLAG(p)) TT.c = TT.g = 30;
> >
> > should presumably be
> >
> >   // Plain style is 30 bytes/line, no grouping.
> >   if (FLAG(p)) {
> >     if (!TT.c) TT.c = 30;
> >     if (!TT.g) TT.g = 30;
> >   }
> >
> > ?
>
> Except we didn't set -g so it would still be set to 30, which is going to
> stick
> spaces into the output.
>
> And xxd_main() starts with if (!TT.c) TT.c = blah so it would never be
> zero at
> that point unless we reorder the code, and then once THAT'S fixed -c 0 is
> still
> !TT.c and if I switch that to if (FLAG(c)) to allow c = 0 through (which
> the
> range in the optstr is allowing) it's used to cap the length in readall()
> meaning the first read becomes EOF so no output gets produced.
>
> > certainly "real" xxd works for me on macos and debian, both of which
> have the
> > same version of xxd:
> >
> > ~$ xxd --version
> > xxd 2022-01-14 by Juergen Weigert et al.
> > ~$ xxd -p -c 0 -l 40 /dev/urandom
> >
> ac160632955aa9d938e60d3533cbcf0febb4decdd12f130e415913ff1fe6e2abcaf7c4a8e980de7a
>
> See, this is extra weird: nothing set -g so it should default to 2.
> Somehow it
> knows to set itself to... I'm guessing 0. Did -p -c 0 get special cased,
> or did
> -p change its default to avoid any breaks even without the -c 0? (Sounds
> like
> the latter is more likely, but I tried "yum install xxd" on my fedora 36
> VM and
> yum doesn't know what an xxd is.
>
>
(Tangent. Seems like xxd behavior is different everywhere, I should be
careful and also note my expected output next time I use xxd as a random
string generator.)

On my debian machine (with the same 2022-01-14 build as Elliott), `man xxd`
says...
* -p: Plain text output, '-g' is ignored, '-c' defaults to 30, '-c 0'
results in one long line (yes '-p -c 0' is a special case for "plain text
no wraping no grouping")
* doesn't say what would happen with '-c 0' otherwise


> > ah, but on another box with 2021-10-22 it's broken. so it looks like
> "real" xxd
> > had the same bug and fixed it recently?
>
> Eh, seems more like a design decision than a bug. Before -p was
> wordwrapping the
> hexdump output and now it isn't. I dunno if it always isn't, or just with
> -c 0?
> We didn't set -g and it has a nonzero default value (1, 2, or 4 depending
> on
> barometric pressure)...
>
> I also note that the man page says -g 0 switches off grouping, but does
> NOT say
> that -c 0 switches off columns? In the V1.10 version I have installed, -c 0
> seems to be a NOP:
>
> $ sha1sum < /dev/null | xxd -c 0
> 00000000: 6461 3339 6133 6565 3565 3662 3462 3064  da39a3ee5e6b4b0d
> 00000010: 3332 3535 6266 6566 3935 3630 3138 3930  3255bfef95601890
> 00000020: 6166 6438 3037 3039 2020 2d0a            afd80709  -.
>
> Once again: easy to change the behavior, hard to tell what the changed
> behavior
> should be. Easiest is to have -p force -g to 0 and -c to huge (stomping
> whatever
> else got set in both). I could also teach -c that 0 means infinite (well,
> sizeof(toybuf) implementation limit which is still bigger than the 256
> directly
> settable limit that I have no idea why it's there) if that's actually a
> thing...?
>
> (Grumble grumble no standard and the reference implementation has version
> skew...)
>
> Rob
>


-- 

Yi-yo Chiang
Software Engineer
yochiang at google.com

I support flexible work schedules, and I’m sending this email now because
it is within the hours I’m working today. Please do not feel obliged to
reply straight away - I understand that you will reply during the hours you
work, which may not match mine.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.landley.net/pipermail/toybox-landley.net/attachments/20220916/baa63811/attachment.htm>


More information about the Toybox mailing list