[Toybox] [PATCH] Fix the operator precedence in expr.

Wed Mar 16 00:49:09 PDT 2016

On 03/15/2016 04:46 PM, Andy Chu wrote:
>> If it's at the end of expr_main() we don't care. However, if leaks
>> per-input and we have unlimited input... (I believe the current linux
>> environment space limit is INT_MAX arguments each INT_MAX long, modulo
>> ulimit applying something to that maybe?)
>>
>> This sounds per-argument, so I'd want to free it when it's done
>> with it if we can. (I need to read the code after applying your
>> patches.)
> 
> OK great, some notes on memory management for when you review:
> 
> expr is a pure function of the argv, and the implementation almost
> always uses the constant argv strings, which we don't need to manage.
> The two exceptions are:
> 
> - re() -- when you do expr foo : 'f\(.*\)', the resulting captured
> string "oo" has to go somewhere
> - type coercion of integers: if you do expr 2 + 3 \< abcd  -- 5 on the
> LHS is converted to "5" by get_str(), called from eval_op().
> 
> So if you don't do any of those things, you won't get any leaks.  In
> the worst case, you could have one leak for every 3 or 4 argv entries
> (you can't make it leak for every entry AFAICT).  And the captured
> string is smaller than the length of a single arg, which is limited to
> 131072 bytes on Linux I think.
> 
> http://www.in-ulm.de/~mascheck/various/argmax/

Yeah I thought so too until recently, but it turns out that's old
information. The 2.6.22 kernel (July 2007) switched it to INT_MAX many
arguments, each of which can 131072 bytes long:

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=b6a2fea39318

That said, there are such things as pilot error. However you do it,
resource exhaustion's a possibility we can't entirely avoid. Just try
not to be wasteful.

> In practice I don't think it will matter because I don't think people
> are really writing huge expressions with expr, because it will get
> unreadable.  It has no intermediate variables -- it would have to be a
> single huge expression that does tons of regex captures or mixes types
> a lot.

Eh, if the variables are still live, they can get cleaned up afterwards.
If the algorithm naturally treats (X+2)-(Y/3) by calculating X+2 first,
discarding those nodes when we're done with them makes sense. But if it
doesn't, and we can't, oh well.

Note: the environment space is laid out linearly by Linux (and the
startup code in libc's crt1.o depends on this, I believe there's a
standard but am not sure off the top of my head which one, possibly
ELF?), and lib/args.c may cherry-pick arguments but won't reorder them
(guaranteed!), so you can detect that an allocation isn't in optargs by
checking:

  if (p<*toys.optargs || p>toys.optargs[toys.optc-1]) not_argv();

It would be nice to free this data ourselves because expr is close to
$((1+2)) in the shell, and it would be nice if the shell could just
internall call expr nofork in that case.

(If we can't do the cleanup, we could also xrun() it and pass the data
back via a pipe, but that's not ideal.)

> I was thinking to clean up in expr_main -- cleaning up more
> aggressively would be a little tricky since get_str() only allocates
> conditionally.  But I guess you're saying that isn't worth it since
> the process will be torn down.

Not in the nofork case, and if the shell can recycle expr to implement
$(( )) then extra work to make it callable nofork is good.

> And yes it definitely helps to have code in pending, so I'm all for
> putting unfinished / untested stuff there!  (Though what I'm surprised
> by is that I started up ConnectBot on my Nexus 5X phone and was able
> to tickle bugs which already have failing test cases!  Like the
> precedence bugs in expr.)

I often fix the test case before I fix the code, to remind me to fix the
code. :)

(I don't always check those in though, because "git diff" showing me a
comment or hunk of code is an easy way of keeping track of todo items.)

Alas it's not so obvious to people other than me which failing test
cases I added, and which were contributed that way.

> thanks,
> Andy

Rob

 1458114549.0