[Toybox] grep corner cases
Rob Landley
rob at landley.net
Thu Aug 22 01:38:30 PDT 2013
On 08/21/2013 03:42:29 PM, Felix Janda wrote:
> Rob Landley wrote:
> > On 08/20/2013 12:29:08 PM, Felix Janda wrote:
> > > Rob Landley wrote:
> > > > On 08/19/2013 02:26:55 PM, Felix Janda wrote:
> > > > > Hi,
> > > > >
> > > > > I saw the comment in changeset 1017 on possible bugs in GNU
> grep.
> > > > >
> > > > > The failing tests are for me:
> > > > >
> > > > > testing "grep -vo" "grep -vo one input" "two\nthree\n"
> > > > > "onetwoonethreeone\n" ""
> > > > > testing "grep -Fx ''" "grep -Fx '' input" "one one one\n"
> "one one
> > > > > one\n" ""
> > > > > testing "grep -F -e blah -e ''" "grep -F -e blah -e '' input"
> "one
> > > > > one one\n" \
> > > > > "one one one\n" ""
> > > > >
> > > > > -o is a GNU extension making grep only output the matched
> parts of
> > > > > each
> > > > > matched line. So since -v inverts the set of all matched lines
> > > grep
> > > > > -vo
> > > > > should not output anything.
> > > >
> > > > Does it invert the set of matched _lines_, or does it invert the
> > > match
> > > > criteria? I made it so that:
> > >
> > > -v
> > > Select lines not matching any of the specified patterns. If
> the
> > > -v option is not specified, selected lines shall be those
> that
> > > match any of the specified patterns.
> > >
> > > Does sound to me like the former. This fits the line based nature
> of
> > > many
> > > of the POSIX tools. It however doesn't make grep -vo very useful.
> >
> > Posix does not have the -o option. The -o option is not line based.
> > This is about the effect of other options on the -o option.
>
> Since you still can't match things spanning lines -o doesn't seem to
> make grep into a byte based tool.
I was thinking more along the lines of "sed exists" if you want to do
fancy stuff. (Neither is a byte based tool, both are line based.)
> What should
>
> echo on | grep -vo a
>
> output? If you say that the sense of matching is inverted shouldn't
> the output be
>
>
> o
> n
> on
>
> or some permutation thereof?
Um, no? There's no a anywhere in "on" so you're not chopping anything
out so there's no reason to split the output. (And you're repeating
letters...? Showing each letter twice...?)
> The current output of toybox is pretting interesting...
I'm trying to figure out, logically, what it _should_ do. What
reverting the test means. If -o shows each matching part as a separate
line (which can very quite a bit with wildcards), then -vo would show
each non-matching part as a separate line.
I'm open to counterarguments, but that sounds right to me...
> > > > echo oneandtwoandthree | grep -ov
> > >
> > > Shouldn't it be
> > >
> > > echo oneandtwoandthree | grep -ov and
> >
> > Yes.
> >
> > > > would produce:
> > > > one
> > > > two
> > > > three
> > > >
> > > > (I pondered onetwothree but that's not how -o without -v
> works...)
Specifically, -o without -v splits lines. It doesn't showing the
matching bits concatenated on one line, therefore the reversed version
would be separate lines too. It just inverts what it splits on and what
it shows, it doesn't change split into concatenate.
> > > > The reason there are deviating test cases to consider is I'm not
> > > taking
> > > > "what gcc does" as an inherent definition of "the right thing to
> > > do".
> > >
> > > But maybe it's a reason to spend some thought on the validity of
> the
> > > test case and maybe do some testing against other
> implementations. For
> > > example busybox grep also doesn't output anything.
> >
> > What other implementations? -o is a gnu/dammit extension.
>
> Did you read the last paragraph you are quoting carefully?
I did, but busybox has a strong tendency to just "do whatever gnu did".
Even when what gnu did doesn't make any sense.
(The answer was "openbsd". I should try again setting up a test
environment, but _dude_. Those guys are not good at dealing with
newbies. Oh well, at least some of them are trying...)
> Just FYI: obase doesn't contain grep.
Posix does.
(Which one was obase again? Not in my toybox roadmap, any relation to
sbase?)
> > > > That implies that
> > > >
> > > > echo one | grep -F -e walrus -e ''
> > > >
> > > > Should match one, but with the gnu/dammit version it only does
> so
> > > > _without_ the -F. Or with -F and just one argument...
> > >
> > > busybox grep also agrees with you.
> >
> > And it's inconsistent:
> >
> > $ echo hello | grep -F -e ''
> > hello
> > $ echo hello | grep -F -e 'one' -e ''
> > $ echo hello | grep -e 'one' -e ''
> > hello
> >
> > (That's pretty clearly a bug. If you're wondering why I'm not
> slavishly
> > copying gnu/dammit behavior it's because they're not very good at
> this.
> > They've just had a very large testing base reporting bugs for a very
> > long time.)
>
> I don't care specifically about the GNU version. With GNU grep lying
> in
> /usr/bin it's just too convenient to run tests against it.
Agreed, but a lot of configure stuff expects gnu behavior. (They claim
portability, and last tested against non-gnu versions 7 years ago...
Sigh.)
Looking forward to bootstrapping sabotage under aboriginal and trying
to build their packages with toybox. That's likely to find all SORTS of
breakage...
Rob
1377160710.0
More information about the Toybox
mailing list