[Toybox] Broken find [Was: Re: Towards find cleanup]

Felix Janda felix.janda at posteo.de
Sat Apr 13 09:06:05 PDT 2013


On 04/11/13 at 05:53pm, Tim Bird wrote:
> On 04/11/2013 03:12 PM, Rob Landley wrote:
> > On 04/11/2013 04:37:36 PM, Felix Janda wrote:
> >> Now I think that I know better how this toy works and see that my  
> >> cleanup
> >> has broken it.
> >>
> >>>>>  	if (filter->op==OP_OR) {
> >>>>> -		result = evaluate(filter->next, node, fnext);
> >>>>> -		result2 = evaluate(*fnext, node, fnext);
> >>>>> -		if (result) {
> >>>>> -			return result;
> >>>>> -		} else {
> >>>>> -			if (result2) {
> >>>>> -				return result2;
> >>>>> -			} else {
> >>>>> -				return 0;
> >>>>> -			}
> >>>>> -		}
> >>>>> +		return evaluate(filter->next, node, fnext) ||  
> >> evaluate(*fnext, node, fnext);
> >>>> This will have different side effects than the original code.
> >>>> The original code evaluates both sides, and then returns the 'or'  
> >> operation
> >>>> on the results.  The new one will (of course) do short-circuit  
> >> evaluation of
> >>>> the '||',  meaning that sometimes the second node won't be  
> >> evaluated.
> >>>> I believe that at one time, I actually had it coded that way, but  
> >> for some
> >>>> reason or other I went back to forcing the evaluation of both  
> >> nodes.
> >>>> Unfortunately, I can't remember the reason why I went back.  I  
> >> would have expected
> >>>> regular find to do short-circuit evaluation of expressions with  
> >> 'or' operations,
> >>>> so I'd be happy if this were correct.  I just have a nagging  
> >> feeling that
> >>>> there was some reason I couldn't do it this way.  I could be  
> >> completely wrong,
> >>>> though.  So unless testing reveals the need for evaluating both  
> >> sides of the
> >>>> 'or', this should be fine.
> >>
> >> The reason is that we want the side effects evaluate has on *fnext. If
> >> we have a prefix expression like
> >>
> >> OR AND CONDITION1 CONDITION2 CONDITION3
> > 
> > You can have and/or prefix expressions?
> > 
> > $ find . -or -exec echo one {} \; -exec echo two {} \; | grep two
> > find: invalid expression; you have used a binary operator '-or' with  
> > nothing before it.
> > 
> >> and CONDITION1 is false, the current version of evaluate will
> >> 1. see OR
> >> 2. descend to AND
> >> 3. find that CONDITION1 is false (side effect: *fnext=CONDITION2)
> >> 4. don't evaluate *fnext but go back to OR
> >> 5. OR now checks CONDITION2 instead of CONDITION3.
> > 
> > I'm not following this at all.
> > 
> > I did however try this:
> > 
> >    find . -exec echo one {} \; -or -exec echo two {} \; | grep two
> > 
> > It does not print two for any of the files, so -or is successfully  
> > short-circuiting. This is in the gnu/dammit version.
> > 
> >> In the previous version of evaluate in step 4 CONDITION2 would have  
> >> been
> >> evaluated, ignored and as a side effect fnext would then point to
> >> CONDITION3.
> > 
> > Which behavior is right, and can you give me a test command line I can  
> > try in the other implementation?
> > 
> >>>>
> >>>> Do you know if regular find does short-circuit evaluation?
> >>>> What would regular find do with
> >>>>   'find . -exec test1_with_side_effects {} \; -o -exec  
> >> test2_with_side_effects {} \;'
> >>>> ?
> >>>
> >>> POSIX says that these short-circuit evaluations shall be applied.  
> >> (The
> >>> same applies for -a.)
> >>
> >> Now we have the problem that we want to have some side effects of
> >> evaluate but not all of them.
> > 
> > I'm not following this.
> 
> The prefix evaluation is an artifact of the implementation.
> find expects and infix expression on the command line.  This find
> converts that to prefix during parsing.  I did this because I
> thought it would be easier to evaluate (with my weird stateful
> evaluator :( )
> 
> I think the easiest solution is to keep the pre-fix format of
> the parsed expression, but handle short-circuiting directly.
> So OR would become something like:
> if (filter->op==OP_OR) {
> 	result = evaluate(filter->next, node, fnext);
> 	if (result) {
> 		skip(*fnext, node, fnext);
> 		return result;
> 	} else
> 		return evaluate(*fnext, node, fnext);
> }

Yeah, I also thought of something like that. Maybe instead of
implementing skip() add an extra parameter to evaluate() (or use a
static variable).

> Of maybe the expression tree should actually be put into a binary
> tree form, instead of serial pre-fix form, for easier traversal.

It confused me a bit when first looking at the code. IMO there should
be a comment which explains the fnext argument to evaluate().

I think that the implementation using the prefix form is really simple.
There are however some details of the evaluation which are different from
what POSIX wants to have:

1. We've already talked about the short-cut evaluations and Tim has given
a possible solution to it.

2. "expression1 expression2" should behave like "expression1 -a
expression2". (This also already came up.) It should be possible to
insert the missing "AND"-operators in build_filter_list(), shouldn't it?

3. Operator precedence. POSIX prescribes that from highest to lowest
precendence we have:

( )
!
-a
-o

The toybox version does not yet take this into account. For example

find . -type d -o -type d -a -type f

should print something. This third point doesn't seem
straightforward to implement to me.

(Summarizingly the way how find evaluates composite expressions is
strongly inspired by how the C language does.)

Felix



More information about the Toybox mailing list