[Toybox] grep -o oddity
Rob Landley
rob at landley.net
Sat Feb 13 07:26:04 PST 2021
On 2/12/21 6:51 PM, enh via Toybox wrote:
> this doesn't work:
>
> /tmp/toybox$ echo "a1234b" | grep -o [0-9]*
> 1234
> /tmp/toybox$ echo "a1234b" | ./toybox grep -o [0-9]*
> /tmp/toybox$
Ok, that's a test case I need to add then. Let's see...
> but this does:
>
> /tmp/toybox$ ./toybox grep -o version /proc/version
> version
> version
>
> as does:
>
> /tmp/toybox$ echo "1234b" | ./toybox grep -o [0-9]*
> 1234
>
> it seems wrong that there's arithmetic on rm_so/rm_eo before the call to
> regexec0() (which will clobber both),
It's an optimization.
When we have more than one pattern we search for them all and then see which one
is first and consume that one, and if anything else matched we subtract off the
part of the string the earlier match ate to if we've already bitten into the
other match. If we _haven't_, we don't need to search again and can just reuse
the previous result at the offset.
The rm_so -= baseline is so we can check shoe->m.rm_so<0, and the eo -= baseline
is for the "we didn't need to re-check it" case. If we do recheck it'll
overwrite both, but there's an if () before that call...
> and it does seem like the problem is that
> we're not getting the right results in rm_so/rm_eo on exit from regexec0(), but
> i failed to work out why...
Yeah, the problem is that the regex call is returning no match. I confirmed it's
_not_ sending it REG_NOTBOL, that the correct string is being passed in to
xregcomp(), that it's the same struct reg address... Peeling off REG_STARTEND
doesn't matter.
Sigh. I have to stick printfs into the regex implementation, don't I? (It's not
a regex bug, it's doing the same with glibc and musl...) Aha!
regexec() returns zero for a successful match or REG_NOMATCH for fail‐
ure.
It's returning zero, and the match is length 0, because "zero or more" is a
match for *, and there are zero digits before the a.
Right. Gotta redo some plumbing...
Rob
More information about the Toybox
mailing list