[Toybox] Toybox bc's transcendental and irrational functions produce incorrect results, memory leaks and exhibit brute force algorithmic complexity.

Mon Aug 27 22:35:38 PDT 2018

Wait.. what?

Am I really reading an email signed by two people?

First off, you used subtractive chunking for division. Calling this a
naive algorithm is giving it more credit than it is worth. The only
time subtractive chunking is used is by school children.

I cloned your repository (van Rijn and Howard?)

https://github.com/gavinhoward/bc/

You (both?) seem to have no idea how to force errors out of a badly
scaled irrational function so I supplied an example (though judging
from the first example I gave, you (both?) will just lie/boast and say
this one works on your system too!):

echo "sqrt(.0000000000000000000000000000123)" | ./bc -l
.0000000000000035071539807692832
echo "sqrt(.0000000000000000000000000000123)" | bc -l
.0000000000000035071355833500363

As you can see, the "solution" is not even close.

It is very hard to produce test results from such a badly made
library, as it has actually taken my computer offline twice now (by
overusing memory and other strange bugs), but I will give it another
go:

This command takes 25 seconds using your math (if you want to call it
that), but only .01 seconds using a correct implementation:

echo "sqrt(123123^123)" | time ./bc -l
11372609275300462201446509717277573045371910522125088219567947999131\
66658519961107066259363279984633201203915727943358917486581704249214\
07062970046980106751948432347504473036486754487763304174402241913448\
76069500421950114801423888362557755902898498344596580006258740796194\
555554490321132814795761735241438589907716.72501611371543803908
25.30user

echo "sqrt(123123^123)" | time bc -l
11372609275300462201446509717277573045371910522125088219567947999131\
66658519961107066259363279984633201203915727943358917486581704249214\
07062970046980106751948432347504473036486754487763304174402241913448\
76069500421950114801423888362557755902898498344596580006258740796194\
555554490321132814795761735241438589907716.72501611371543803908
0.01user

It is easy to scale this to take more time than the lifetime of the
universe while other implementations are still instantaneous.

I revved your repository back using git checkout and discovered a
sordid history. Including an entire fast arbitrary precision
implementation written by other authors (appears to be CM Graff and
Bao Hexing) which was stripped, renamed and then badly reimplemented.

Please just fix your fix your "implementation" (if one can really call
it that) and stop wasting everyone's time.

Sasha Toltec
Toltec Enterprises
sashatoltec1 at gmail.com

On 8/27/18, Gavin Howard <gavin.d.howard at gmail.com> wrote:
> Hi Sasha,
>
> Thank you for bringing to my attention these potentially serious
> issues. I will certainly investigate each of them and make the
> appropriate changes. I've included an associate who is familiar
> with the origins and development of this `bc' to help me address
> each point below, as you make some strong assertions.
>
> To your message [1] "Toybox bc run away interactive process",
>
>> When attempting to use ctrl-c to kill the toybox `bc' implementation
>> while in non-interactive mode a bug results in it entering interactive
>> mode while the program continues to run and then the user loses
>> control of both interactive and non-interactive modes.
>>
>> To reproduce:
>> echo "s(131231)" | ./bc -l   .. and hit ctrl-c
>>
>> This consumes many GB of memory before finally exhausting the system.
>
> On a glibc-based x86_64 system I was not able to reproduce this
> issue as stated above, however one of your later messages stated
> that you encountered the issues on an Alpine Linux (musl-based)
> system. On an Alpine 3.8 system I still could not reproduce it,
> both in the Toybox environment and the stand-alone upstream [5].
>
> My associate was able to reproduce this using '131231' and '0.5'
> on Alpine, even using a fixed scale, e.g., "scale=50", using the
> stand-alone version.
>
> As to your second message [2] "Toybox bc's transcendental and
> irrational functions produce incorrect results, memory leaks and
> exhibit brute force algorithmic complexity.",
>
>> Upon inspection of the mathematics core of the toybox `bc' it appears
>> to have multiple systemic issues and is potentially unsalvagable.
>
> Let me begin by saying that I'm grateful for your insightful and
> detailed suggestions. If you have any more suggestions that may
> be helpful please feel free to write those to the mailing list.
> Several subscribers are definitely interested in this material.
>
>> All transcendental functions produce out of bounds memory accesses.
>> All builtin functions produce gratuitous and insecure memory errors.
>
> We've tested `bc' using three static analysis tools, two memory
> scanners (Google Address Sanitizer and Valgrind), in addition to
> manual review, and have not found "gratuitous and insecure
> memory errors". There may be a single leak in the IO module, and
> we are presently looking into this.
>
> Valgrind does indicate excess (but not illegal) memory allocation in
> some of the transcendental functions, with certain inputs. We've
> yet to isolate specific classes of inputs that cause the issue.
>
>> Excessive or incorrect memory consumption for invocations of
>> transcendental functions result in total memory exhaustion consuming
>> about 1 GB for every few seconds of run time.
>
> Would you please explain how we might be able to reproduce this?
> Is there a specific scenario or condition where this occurs, and
> on which particular platform(s) does this occur?
>
>> The algorithms used are of the most naive possible variety and balk
>> under even the smallest inputs taking hundreds of times longer than
>> similar implementations.
>
> Typically, "naive" algorithms perform better for smaller inputs
> and more "advanced" algorithms are desired for their lower bound
> which is realized as the input grows large. To hear that "even
> the smallest inputs [are] taking hundreds of times longer" than
> similar implementations is ... odd.
>
> Would you please share your benchmarks and testing methods? It
> will be tremendously helpful. Reference benchmarks to "similar"
> implementations (and what those are) would be even better.
>
> For larger problems, you are certainly correct. This `bc' hasn't
> been optimized for performance; it's been optimized for size. It
> implements a complete language, virtual machine and math library
> in under 6000 LOC.
>
>> All builtin transcendental and irrational functions produce incorrect
>> results.
>
> Would you please provide an example (ideally for each) showing
> an instance where incorrect results are produced?
>
>> Functions that should be complete in a fraction of a second are
>> getting bogged down in naive algorithmic complexity.
>
> See my above remark re: algorithmic complexity, and please offer
> an example or two. There had been some discussion regarding just
> how "precise" the computation needs to be: an exact solution for
> all digits except the last? We will refer to the specification
> for clarification on this matter.
>
>> I had read that a team of mathematicians and systems programmers were
>> working on the new toybox bc implementation. What exactly happened
>> here?
>
> I was not aware of such efforts/arrangements. The Toybox mailing
> list on 2018-03-09 by a Christopher M. Graff states that he had
> asked a few people, including Gavin D. Howard ("gdh") who wrote
> the `bc' in question [5], to assist him with his implementation.
> Who are you referring to? That message doesn't reference either,
> but perhaps we could collaborate with the mathematicians to help
> in achieving lower bounds?
>
> In your third message [4],
>
>> We ran across these errors while doing "make check" on the hebimath
>> bignum library on an alpine linux system running against toybox's
>> `bc'. Hebimath uses POSIX bc as part of its test suite, Perhaps it
>> could be used to drive the core arithmetic of your POSIX bc
>> implementation as it is MIT licensed and is fast and accurate.
>
>> https://github.com/suiginsoft/hebimath
>
> This is a helpful suggestion (to use the 'hebimath' test suite);
> thank you Sasha. In December 2017 my associate had discussed the
> 'hebimath' library in a Freenode channel with the aforementioned
> CM Graff and determined it a reasonable implementation, but that
> it was rather large and would not "fit" into the Toybox project
> (and size is the limiting factor).
>
> Anyway, we'll review and make the necessary improvements.
>
> Regards,
>
>     Gavin D. Howard             Zach van Rijn
>     "gdh"                       "Thalheim"
>
>
> [1]: http://lists.landley.net/pipermail/toybox-landley.net
> /2018-August/009626.html
>
> [2]: http://lists.landley.net/pipermail/toybox-landley.net
> /2018-August/009628.html
>
> [3]: http://lists.landley.net/pipermail/toybox-landley.net
> /2018-March/009403.html
>
> [4]: http://lists.landley.net/pipermail/toybox-landley.net
> /2018-August/009629.html
>
> [5]: https://github.com/gavinhoward/bc
>
> On Mon, Aug 27, 2018 at 7:29 AM Sasha Toltec <sashatoltec1 at gmail.com>
> wrote:
>
>> We ran across these errors while doing "make check" on the hebimath
>> bignum library on an alpine linux system running against toybox's
>> `bc'. Hebimath uses POSIX bc as part of its test suite, Perhaps it
>> could be used to drive the core arithmetic of your POSIX bc
>> implementation as it is MIT licensed and is fast and accurate.
>>
>> https://github.com/suiginsoft/hebimath
>>
>> Sasha Toltec
>> Toltec Enterprises
>> sashatoltec1 at gmail.com
>>
>> On 8/27/18, Sasha Toltec <sashatoltec1 at gmail.com> wrote:
>> > Upon inspection of the mathematics core of the toybox bc it appears to
>> > have multiple systemic issues and is potentially unsalvagable.
>> >
>> > All transcendental functions produce out of bounds memory accesses.
>> >
>> > All builtin functions produce gratuitous and insecure memory errors.
>> >
>> > Excessive or incorrect memory consumption for invocations of
>> > transcendental functions result in total memory exhaustion consuming
>> > about 1 GB for every few seconds of run time.
>> >
>> > The algorithms used are of the most naive possible variety and balk
>> > under even the smallest inputs taking hundreds of times longer than
>> > similar implementations.
>> >
>> > All builtin transcendental and irrational functions produce incorrect
>> > results.
>> >
>> > Functions that should be complete in a fraction of a second are
>> > getting bogged down in naive algorithmic complexity.
>> >
>> > I had read that a team of mathematicians and systems programmers were
>> > working on the
>> > new toybox bc implementation. What exactly happened here?
>> >
>> >
>> > Sasha Toltec
>> > Toltec Enterprises
>> > sashatoltec1 at gmail.com
>> >
>> _______________________________________________
>> Toybox mailing list
>> Toybox at lists.landley.net
>> http://lists.landley.net/listinfo.cgi/toybox-landley.net
>>
>