[Toybox] Broken find [Was: Re: Towards find cleanup]
Felix Janda
felix.janda at posteo.de
Sat Apr 13 09:06:05 PDT 2013
On 04/11/13 at 05:53pm, Tim Bird wrote:
> On 04/11/2013 03:12 PM, Rob Landley wrote:
> > On 04/11/2013 04:37:36 PM, Felix Janda wrote:
> >> Now I think that I know better how this toy works and see that my
> >> cleanup
> >> has broken it.
> >>
> >>>>> if (filter->op==OP_OR) {
> >>>>> - result = evaluate(filter->next, node, fnext);
> >>>>> - result2 = evaluate(*fnext, node, fnext);
> >>>>> - if (result) {
> >>>>> - return result;
> >>>>> - } else {
> >>>>> - if (result2) {
> >>>>> - return result2;
> >>>>> - } else {
> >>>>> - return 0;
> >>>>> - }
> >>>>> - }
> >>>>> + return evaluate(filter->next, node, fnext) ||
> >> evaluate(*fnext, node, fnext);
> >>>> This will have different side effects than the original code.
> >>>> The original code evaluates both sides, and then returns the 'or'
> >> operation
> >>>> on the results. The new one will (of course) do short-circuit
> >> evaluation of
> >>>> the '||', meaning that sometimes the second node won't be
> >> evaluated.
> >>>> I believe that at one time, I actually had it coded that way, but
> >> for some
> >>>> reason or other I went back to forcing the evaluation of both
> >> nodes.
> >>>> Unfortunately, I can't remember the reason why I went back. I
> >> would have expected
> >>>> regular find to do short-circuit evaluation of expressions with
> >> 'or' operations,
> >>>> so I'd be happy if this were correct. I just have a nagging
> >> feeling that
> >>>> there was some reason I couldn't do it this way. I could be
> >> completely wrong,
> >>>> though. So unless testing reveals the need for evaluating both
> >> sides of the
> >>>> 'or', this should be fine.
> >>
> >> The reason is that we want the side effects evaluate has on *fnext. If
> >> we have a prefix expression like
> >>
> >> OR AND CONDITION1 CONDITION2 CONDITION3
> >
> > You can have and/or prefix expressions?
> >
> > $ find . -or -exec echo one {} \; -exec echo two {} \; | grep two
> > find: invalid expression; you have used a binary operator '-or' with
> > nothing before it.
> >
> >> and CONDITION1 is false, the current version of evaluate will
> >> 1. see OR
> >> 2. descend to AND
> >> 3. find that CONDITION1 is false (side effect: *fnext=CONDITION2)
> >> 4. don't evaluate *fnext but go back to OR
> >> 5. OR now checks CONDITION2 instead of CONDITION3.
> >
> > I'm not following this at all.
> >
> > I did however try this:
> >
> > find . -exec echo one {} \; -or -exec echo two {} \; | grep two
> >
> > It does not print two for any of the files, so -or is successfully
> > short-circuiting. This is in the gnu/dammit version.
> >
> >> In the previous version of evaluate in step 4 CONDITION2 would have
> >> been
> >> evaluated, ignored and as a side effect fnext would then point to
> >> CONDITION3.
> >
> > Which behavior is right, and can you give me a test command line I can
> > try in the other implementation?
> >
> >>>>
> >>>> Do you know if regular find does short-circuit evaluation?
> >>>> What would regular find do with
> >>>> 'find . -exec test1_with_side_effects {} \; -o -exec
> >> test2_with_side_effects {} \;'
> >>>> ?
> >>>
> >>> POSIX says that these short-circuit evaluations shall be applied.
> >> (The
> >>> same applies for -a.)
> >>
> >> Now we have the problem that we want to have some side effects of
> >> evaluate but not all of them.
> >
> > I'm not following this.
>
> The prefix evaluation is an artifact of the implementation.
> find expects and infix expression on the command line. This find
> converts that to prefix during parsing. I did this because I
> thought it would be easier to evaluate (with my weird stateful
> evaluator :( )
>
> I think the easiest solution is to keep the pre-fix format of
> the parsed expression, but handle short-circuiting directly.
> So OR would become something like:
> if (filter->op==OP_OR) {
> result = evaluate(filter->next, node, fnext);
> if (result) {
> skip(*fnext, node, fnext);
> return result;
> } else
> return evaluate(*fnext, node, fnext);
> }
Yeah, I also thought of something like that. Maybe instead of
implementing skip() add an extra parameter to evaluate() (or use a
static variable).
> Of maybe the expression tree should actually be put into a binary
> tree form, instead of serial pre-fix form, for easier traversal.
It confused me a bit when first looking at the code. IMO there should
be a comment which explains the fnext argument to evaluate().
I think that the implementation using the prefix form is really simple.
There are however some details of the evaluation which are different from
what POSIX wants to have:
1. We've already talked about the short-cut evaluations and Tim has given
a possible solution to it.
2. "expression1 expression2" should behave like "expression1 -a
expression2". (This also already came up.) It should be possible to
insert the missing "AND"-operators in build_filter_list(), shouldn't it?
3. Operator precedence. POSIX prescribes that from highest to lowest
precendence we have:
( )
!
-a
-o
The toybox version does not yet take this into account. For example
find . -type d -o -type d -a -type f
should print something. This third point doesn't seem
straightforward to implement to me.
(Summarizingly the way how find evaluates composite expressions is
strongly inspired by how the C language does.)
Felix
More information about the Toybox
mailing list