[Toybox] grep corner cases

Rob Landley rob at landley.net
Thu Aug 22 01:38:30 PDT 2013


On 08/21/2013 03:42:29 PM, Felix Janda wrote:
> Rob Landley wrote:
> > On 08/20/2013 12:29:08 PM, Felix Janda wrote:
> > > Rob Landley wrote:
> > > > On 08/19/2013 02:26:55 PM, Felix Janda wrote:
> > > > > Hi,
> > > > >
> > > > > I saw the comment in changeset 1017 on possible bugs in GNU  
> grep.
> > > > >
> > > > > The failing tests are for me:
> > > > >
> > > > > testing "grep -vo" "grep -vo one input" "two\nthree\n"
> > > > > "onetwoonethreeone\n" ""
> > > > > testing "grep -Fx ''" "grep -Fx '' input" "one one one\n"  
> "one one
> > > > > one\n" ""
> > > > > testing "grep -F -e blah -e ''" "grep -F -e blah -e '' input"  
> "one
> > > > > one one\n" \
> > > > >   "one one one\n" ""
> > > > >
> > > > > -o is a GNU extension making grep only output the matched  
> parts of
> > > > > each
> > > > > matched line. So since -v inverts the set of all matched lines
> > > grep
> > > > > -vo
> > > > > should not output anything.
> > > >
> > > > Does it invert the set of matched _lines_, or does it invert the
> > > match
> > > > criteria? I made it so that:
> > >
> > >      -v
> > >      Select lines not matching any of the specified patterns. If  
> the
> > >      -v option is not specified, selected lines shall be those  
> that
> > >      match any of the specified patterns.
> > >
> > > Does sound to me like the former. This fits the line based nature  
> of
> > > many
> > > of the POSIX tools. It however doesn't make grep -vo very useful.
> >
> > Posix does not have the -o option. The -o option is not line based.
> > This is about the effect of other options on the -o option.
> 
> Since you still can't match things spanning lines -o doesn't seem to
> make grep into a byte based tool.

I was thinking more along the lines of "sed exists" if you want to do  
fancy stuff. (Neither is a byte based tool, both are line based.)

> What should
> 
> echo on | grep -vo a
> 
> output? If you say that the sense of matching is inverted shouldn't
> the output be
> 
> 
> o
> n
> on
> 
> or some permutation thereof?

Um, no? There's no a anywhere in "on" so you're not chopping anything  
out so there's no reason to split the output. (And you're repeating  
letters...? Showing each letter twice...?)

> The current output of toybox is pretting interesting...

I'm trying to figure out, logically, what it _should_ do. What  
reverting the test means. If -o shows each matching part as a separate  
line (which can very quite a bit with wildcards), then -vo would show  
each non-matching part as a separate line.

I'm open to counterarguments, but that sounds right to me...

> > > >    echo oneandtwoandthree | grep -ov
> > >
> > > Shouldn't it be
> > >
> > >     echo oneandtwoandthree | grep -ov and
> >
> > Yes.
> >
> > > > would produce:
> > > >    one
> > > >    two
> > > >    three
> > > >
> > > > (I pondered onetwothree but that's not how -o without -v  
> works...)

Specifically, -o without -v splits lines. It doesn't showing the  
matching bits concatenated on one line, therefore the reversed version  
would be separate lines too. It just inverts what it splits on and what  
it shows, it doesn't change split into concatenate.

> > > > The reason there are deviating test cases to consider is I'm not
> > > taking
> > > > "what gcc does" as an inherent definition of "the right thing to
> > > do".
> > >
> > > But maybe it's a reason to spend some thought on the validity of  
> the
> > > test case and maybe do some testing against other  
> implementations. For
> > > example busybox grep also doesn't output anything.
> >
> > What other implementations? -o is a gnu/dammit extension.
> 
> Did you read the last paragraph you are quoting carefully?

I did, but busybox has a strong tendency to just "do whatever gnu did".  
Even when what gnu did doesn't make any sense.

(The answer was "openbsd". I should try again setting up a test  
environment, but _dude_. Those guys are not good at dealing with  
newbies. Oh well, at least some of them are trying...)

> Just FYI: obase doesn't contain grep.

Posix does.

(Which one was obase again? Not in my toybox roadmap, any relation to  
sbase?)

> > > > That implies that
> > > >
> > > >    echo one | grep -F -e walrus -e ''
> > > >
> > > > Should match one, but with the gnu/dammit version it only does  
> so
> > > > _without_ the -F. Or with -F and just one argument...
> > >
> > > busybox grep also agrees with you.
> >
> > And it's inconsistent:
> >
> >    $ echo hello | grep -F -e ''
> >    hello
> >    $ echo hello | grep -F -e 'one' -e ''
> >    $ echo hello | grep -e 'one' -e ''
> >    hello
> >
> > (That's pretty clearly a bug. If you're wondering why I'm not  
> slavishly
> > copying gnu/dammit behavior it's because they're not very good at  
> this.
> > They've just had a very large testing base reporting bugs for a very
> > long time.)
> 
> I don't care specifically about the GNU version. With GNU grep lying  
> in
> /usr/bin it's just too convenient to run tests against it.

Agreed, but a lot of configure stuff expects gnu behavior. (They claim  
portability, and last tested against non-gnu versions 7 years ago...  
Sigh.)

Looking forward to bootstrapping sabotage under aboriginal and trying  
to build their packages with toybox. That's likely to find all SORTS of  
breakage...

Rob
 1377160710.0


More information about the Toybox mailing list