[Toybox] Fwd: Graff and bc

Tue Mar 13 09:06:02 PDT 2018

Forwarded mail that was accidentally sent privately to Rob.
Gavin Howard

---------- Forwarded message ----------
From: Gavin Howard <gavin.d.howard at gmail.com>
Date: Mon, Mar 12, 2018 at 5:02 PM
Subject: Re: [Toybox] Graff and bc
To: Rob Landley <rob at landley.net>

That is a really good email, and I guess we have had a bit of
miscommunication about this.

To me, I wasn't writing the "toybox bc." I was writing "a bc" that
could be automatically integrated into toybox whenever I released. I
would like the bc to be useful other places as well, such as busybox
and standalone. (Sabotage Linux already wants it standalone.) So while
I wrote it in toybox style, I also wrote it to be useful in other
places.

That should explain some of my choices, but I will go into a bit of detail.

> I intend to do a series of cleanup passes, as I generally do to every command in
> pending. I intend to check them in one at a time, in reasonably bite-sized
> chunks, and write up a description of what I did each time. Just like the series
> linked to on https://landley.net/toybox/cleanup.html
>
> Some of it's coding style issues:
>
> $ grep typedef toys/pending/bc.c | wc
>      34     135    1129
> $ grep typedef toys/other/*.c | wc
>      0      0      0

Yes, cleanup passes are necessary; I have no doubt about that. But in
this case, based on the style I use, grepping for typedefs is not
entirely accurate because I typedef all of my struct definitions. It's
not something everyone does, but I do. Do you want me to remove them?

> And some is because I expect I honestly can make it smaller and simpler.

If you can make it smaller and simpler, well, that would improve mine.
Please let me know.

> I'm not complaining! It's a good bc submission. It'll take me a while just ot
> review it. However, when I did "make bc" I got a lot of:
>
> toys/pending/bc.c: In function 'bc_vm_execStdin':
> toys/pending/bc.c:8945:7: error: 'for' loop initial declarations are only
> allowed in C99 mode
>        for (uint32_t i = 0; i < len; ++i) {

Oops. I tried to catch all of those before I submitted. Why my
compilers don't complain, I don't know.

> So I already locally have changes in my tree just to get it to compile, because
> the compiler on ubuntu 14.04 won't let it declare variables in a for loop by
> default with the set of flags I'm feeding it, and although c99 allows variable
> declarations anywhere the code style has 'em at the top of blocks so rather than
> fix the build break with a command line flag I fixed it by changing the code to
> adhere to the toybox coding style.

Please tell me anywhere I missed this; I did try to find them all.

> Similarly, toybox uses xmalloc() and xrealloc() and xstrdup() and such because
> if malloc() and friends every return NULL it means you've exhausted _virtual_
> address space. Physical memory exhaustion manifests as either the OOM killer
> getting called asynchronously and killing your process, or the system swap
> thrashing itself to death and freezing hard. (There are things like strict
> overcommit and nommu that can lead to allocation failures long before the system
> is anywhere near actually out of memory, but neither is the common case and
> nommu fragmentation case isn't particularly recoverable.) So toybox's policy is
> to use an xmalloc() wrapper that exits with an error message if malloc fails,
> because it's an unrecoverable situation. Your code instead does a lot of:
>
> #define BC_PROGRAM_BUF_SIZE (1024)
>
>   buffer = malloc(BC_PROGRAM_BUF_SIZE + 1);
>
>   if (!buffer) return BC_STATUS_MALLOC_FAIL;
>
> Which isn't how toybox does stuff anywhere else.

I can see why toybox does this. The bc still fails on malloc, though.
It just does not call exit, for reason explained below on destructors.

>
> It's possible some allocations should be considered recoverable, perhaps
> allocating a vector of an unreasonably large size should fail instead of dying.
> But _malloc_ isn't going to reliably tell you there's a problem. Unless you're
> allocating multiple gigabytes on 32 bit (and basically _never_ on 64 bit),
> malloc isn't what will fail. Running bc on a system with 512 megs of ram after
> allocating 1.5 gigs of vector will demonstrate a problem when you dirty the
> pages and it runs out of freeable pages to fault in, long after malloc's
> returned. In order to tell that an allocation will cause a problem later, you
> basically have to query free memory yourself. (Or enable strict overcommit,
> which will basically false positive the system to death. Android's approach
> seems to be a cross between the two, killing apps from their own userspace oom
> manager to prevent the system from coming _near_ memory exhaustion so things
> like video recording and audio playback never drop frames. They may suddenly
> _stop_, but they won't have subtle dropouts. But that's from an ELC presentation
> a few years ago and may have changed in newer versions...)
>
> Also, toybox has a page sized buffer (char toybuf[4096];) it can use as scratch
> space for short-lived allocations. This never uses that, and instead has short
> lived allocations.

Using a small buffer like that in a bc would not work, for anything,
really. Strings live through the entire execution of a bc file, as do
number strings (parsed from the code), which are the only small
allocations besides vectors and numbers themselves. Vectors always
live for the duration of the program, as well, so I can't use the
scratch buffer for them either.

Even when doing math and creating temporary numbers, I can't really
use a fixed-size scratch buffer, not safely. Numbers calculated by bc
can get huge, and it would just bloat the code to handle the two cases
(scratch buffer vs malloc) separately.

> And some things it does are just gratuitously "this doesn't look/work like any
> other toybox command", ala:
>
> #define bcg (TT)

This was, again, because I meant this to be used other places as well.

> in lib/lib.h toybox has:
>
> #define minof(a, b) ({typeof(a) aa = (a); typeof(b) bb = (b); aa<bb ? aa : bb;})
> #define maxof(a, b) ({typeof(a) aa = (a); typeof(b) bb = (b); aa>bb ? aa : bb;})
>
> And your bc has:
>
> #define BC_MAX(a, b) ((a) > (b) ? (a) : (b))
> #define BC_MIN(a, b) ((a) < (b) ? (a) : (b))

In this case, I know every case where those happen, so I am not
worried. But I can fix it, if you would like.

> (Which doesn't protect against multiple evaluation of the macro argument like
> the first does.)
>
> You have your own hand-rolled flag macros:
>
> #define BC_FLAG_WARN (1<<0)
> #define BC_FLAG_STANDARD (1<<1)
> #define BC_FLAG_QUIET (1<<2)
> #define BC_FLAG_MATHLIB (1<<3)
> #define BC_FLAG_INTERACTIVE (1<<4)
> #define BC_FLAG_CODE (1<<5)
>
> When toybox is generating FLAG_w and FLAG_s and so on already.

This could be changed in my release script and in my code, and I will do so.

> You have a fairly standard "success is zero" semantic but use a
> BC_STATUS_SUCCESS enum for the zero. This means callers could use if (!thingy())
> but it's not obvious that they can because you hide it, and make it seem like
> this semantic could change in future?
>
> You have a lot of things only used once or twice, like bc_func_insertAuto() that
> might as well just be inlined.

I did it so I would remember the purpose of the code. I don't know
what I think of inlining code like that.

> This is just the stuff obvious at a quick glance. I haven't done proper code
> review yet. The whole destructor thing looks like it could be redesigned out,
> but that's a can of worms.

Okay, I must admit that I disagree entirely with you here. I do not
want *any* memory leaks in my bc. I mean, it would be easy to just
leak everywhere, but if I am careful to clean up, I could keep a
terminal with bc open all day long. And I could have an easier time
calculating a million digits of pi, or something like that, that would
take mass swaths of memory. Also, this is the reason I do not call
exit on malloc fail, as well as why I have an enum of statuses.
BC_STATUS_SUCCESS will never change, no, but I have an enum so that
users could actually look up exit codes and get something useful out
of errors. That is also why I have some kb of error messages.

Now for the flip side, I understand that most people don't think of bc
as a long-running program, so it might be fine for toybox. (In fact, I
thought you would be fine with it not being perfect in that area,
which is why I gave the bc to you so soon.) In that case, I could
construct my release script and my code to exclude destructors from
toybox releases, just for you. And I would probably know best how to
do that. But we can talk about how to possibly do that later.

>There's snippets like:
>
>       case BC_INST_PUSH_OBASE:
>       {
>         result.type = BC_RESULT_OBASE;
>
> Where I go "why are those two different values?

One is an enum, to make the code more readable and because memory
doesn't matter so much for it. One is a char, because memory matters,
and it's a bytecode instruction. Also, I do not want to use the
bytecode instruction as a replacement for the enum for compile time
checks.

> Could the code be simplified if
> they were the same value?" But again, haven't traced down that path yet. I need
> to read and follow the logic first, and at 9000 lines that's an undertaking.

To be honest, I was okay with you not understanding the code. I was
planning on being a part of toybox from here on out, and I was okay
with you passing the buck on bc bugs to me. I think that I am pretty
quick (besides Sundays) at answering questions and issues, and I
intended to, both from GitHub and the mailing list, or from wherever
they come. I thought that if I did so, you would not have to worry
about the content of the code so much as the functionality.

> So instead of applying a series of cleanup patches, you want me to send them to
> you and then pull them back from you?
>
> *shrug* I can do that.
>
> But what I'd really like is regression tests...

I will get you regression tests. I will look up how you want them too.
I already have a lot, and I will adapt them for toybox. But as a side
note, I did run all of my current regression tests before submitting
to you, and I would do that before updating toybox after bug reports
or before releases.

Also, you are welcome to send long emails with questions asking my why
I did things a certain way.

As a final note and summary, I do not consider this bc to be "done."
What I mean by that is that I am not done working on it. There will
still be bugs, there will need to be maintenance, and I would like to
do that. Having an automated release would make that possible, so if
any changes need to be made, making them in my repo and then cutting
another release to toybox would be easiest for me.

I promise that I will respond fast.

Gavin Howard