[Toybox] Graff and bc

Tue Mar 13 09:55:20 PDT 2018

> And still not on the list. Oh well.

Actually, that *was* sent to the list; I just also sent it to you
directly. I won't make that mistake again. I will only send things to
the list.

>> That is a really good email, and I guess we have had a bit of
>> miscommunication about this.
>>
>> To me, I wasn't writing the "toybox bc." I was writing "a bc" that
>> could be automatically integrated into toybox whenever I released. I
>> would like the bc to be useful other places as well, such as busybox
>> and standalone. (Sabotage Linux already wants it standalone.) So while
>> I wrote it in toybox style, I also wrote it to be useful in other
>> places.
>
> I agree it has value as an external command. But there's no point integrating it
> into toybox if it's not really integrated into toybox.
>
> I'm reasonably confident that proper toybox integration would reduce it by 2000
> lines. I _suspect_ I can get it down under 4000 total, but haven't done anywhere
> near the full analysis yet, and that would involve some design changes.
>
> This sounds like something you don't want.

I am okay with it, but I seriously doubt you can. Maybe that's because
I am much younger than you, but I did write as tersely as I could.

>> That should explain some of my choices, but I will go into a bit of detail.
>>
>>> I intend to do a series of cleanup passes, as I generally do to every command in
>>> pending. I intend to check them in one at a time, in reasonably bite-sized
>>> chunks, and write up a description of what I did each time. Just like the series
>>> linked to on https://landley.net/toybox/cleanup.html
>>>
>>> Some of it's coding style issues:
>>>
>>> $ grep typedef toys/pending/bc.c | wc
>>>      34     135    1129
>>> $ grep typedef toys/other/*.c | wc
>>>      0      0      0
>>
>> Yes, cleanup passes are necessary; I have no doubt about that. But in
>> this case, based on the style I use, grepping for typedefs is not
>> entirely accurate because I typedef all of my struct definitions. It's
>> not something everyone does, but I do. Do you want me to remove them?
>
> This is brexit all over again. "We want to be part of the EU but we want to be
> our own country with our own currency and we want a veto over EU regulations
> applying to us and we want free movement for our people but the ability to keep
> foreigners out..."

I am not sure what your point is here. I asked if you wanted them out.

>>> And some is because I expect I honestly can make it smaller and simpler.
>>
>> If you can make it smaller and simpler, well, that would improve mine.
>> Please let me know.
>
> One example from the last cleanup pass: you have a duplicate globals block
> because you're not using the generated one. That's not improving it _not_ as
> part of toybox, that's tighter integration with toybox. Which seems to be
> something you don't want.

The duplicate globals block was a mistake, a bug in my release script.
That has been fixed.

>>> Also, toybox has a page sized buffer (char toybuf[4096];) it can use as scratch
>>> space for short-lived allocations. This never uses that, and instead has short
>>> lived allocations.
>>
>> Using a small buffer like that in a bc would not work, for anything,
>> really. Strings live through the entire execution of a bc file, as do
>> number strings (parsed from the code), which are the only small
>> allocations besides vectors and numbers themselves. Vectors always
>> live for the duration of the program, as well, so I can't use the
>> scratch buffer for them either.
>
> Then why are they fixed length?

I am not sure what you are talking about here. Vectors are not fixed
length, unless you are talking about the struct itself, and those are
all stored in the BcProgram struct. Strings are (sort of) fixed
length, but you don't know how many of them you need. I would rather
not have two cases for strings either.

>> Even when doing math and creating temporary numbers, I can't really
>> use a fixed-size scratch buffer, not safely. Numbers calculated by bc
>> can get huge, and it would just bloat the code to handle the two cases
>> (scratch buffer vs malloc) separately.
>
> Understood. I haven't dug far enough into the code yet, there's a lot of it and
> I analyze very closely when reviewing, which is slow.

Now that I am sending mail only to the list, questions can be asked.

>>> And some things it does are just gratuitously "this doesn't look/work like any
>>> other toybox command", ala:
>>>
>>> #define bcg (TT)
>>
>> This was, again, because I meant this to be used other places as well.
>
> Indeed. You want a command that's not part of toybox but has a translation layer
> to plug _in_ to toybox.
>
> Toybox is designed so you can drop an external file in and the build
> infrastructure will just pick it up. There can be toybox commands that aren't
> bundled into the base distro.

This one has also been fixed in my release script.

>>> in lib/lib.h toybox has:
>>>
>>> #define minof(a, b) ({typeof(a) aa = (a); typeof(b) bb = (b); aa<bb ? aa : bb;})
>>> #define maxof(a, b) ({typeof(a) aa = (a); typeof(b) bb = (b); aa>bb ? aa : bb;})
>>>
>>> And your bc has:
>>>
>>> #define BC_MAX(a, b) ((a) > (b) ? (a) : (b))
>>> #define BC_MIN(a, b) ((a) < (b) ? (a) : (b))
>>
>> In this case, I know every case where those happen, so I am not
>> worried. But I can fix it, if you would like.
>
> You have duplicate infrastructure. You have your own redundant implementations
> of things toybox already has. The whole point of toybox is _not_ doing that.

As has this one.

>>> (Which doesn't protect against multiple evaluation of the macro argument like
>>> the first does.)
>>>
>>> You have your own hand-rolled flag macros:
>>>
>>> #define BC_FLAG_WARN (1<<0)
>>> #define BC_FLAG_STANDARD (1<<1)
>>> #define BC_FLAG_QUIET (1<<2)
>>> #define BC_FLAG_MATHLIB (1<<3)
>>> #define BC_FLAG_INTERACTIVE (1<<4)
>>> #define BC_FLAG_CODE (1<<5)
>>>
>>> When toybox is generating FLAG_w and FLAG_s and so on already.
>>
>> This could be changed in my release script and in my code, and I will do so.

These have been fixed in my release script.

>>> You have a fairly standard "success is zero" semantic but use a
>>> BC_STATUS_SUCCESS enum for the zero. This means callers could use if (!thingy())
>>> but it's not obvious that they can because you hide it, and make it seem like
>>> this semantic could change in future?
>>>
>>> You have a lot of things only used once or twice, like bc_func_insertAuto() that
>>> might as well just be inlined.
>>
>> I did it so I would remember the purpose of the code. I don't know
>> what I think of inlining code like that.

I inlined that and the companion bc_func_insertParam().

>>> This is just the stuff obvious at a quick glance. I haven't done proper code
>>> review yet. The whole destructor thing looks like it could be redesigned out,
>>> but that's a can of worms.
>>
>> Okay, I must admit that I disagree entirely with you here. I do not
>> want *any* memory leaks in my bc.
>
> A) Memory leaks on a long run is different from spending cleanup code to do
> something exit() is about to do for you.

And we can see about CFG_TOYBOX_FREE for those.

> B) if you allocate a contiguous structure instead of dangling pointers (as the
> dirtree infrastructure does, the string filename is appended to the structure as
> part of the same allocation), then your destructor is just "free", even for
> different types of objects. (It means there's slightly more work allocating and
> modifying them, but you get a unified free() out of it.)

So like a memory pool? I tried that; it added *huge* amounts to the code.

>> I mean, it would be easy to just
>> leak everywhere, but if I am careful to clean up, I could keep a
>> terminal with bc open all day long. And I could have an easier time
>> calculating a million digits of pi, or something like that, that would
>> take mass swaths of memory.
>
> I agree that's a downside. Stuff like netcat and sed and so on can't leak either
> when dealing with arbitrary amounts of data passing through them in a single
> run. But they _can_ leak what's left over when they're about to exit, because
> the system cleans up.
>
> There's a CFG_TOYBOX_FREE command line option that unnecessarily frees memory to
> make things like valgrind happy, but not all commands do that and those that do
> allow it to be configured out.
>
> (It's a toybox thing.)

And I am okay with trying to deal with that in my release script.

>> Also, this is the reason I do not call
>> exit on malloc fail, as well as why I have an enum of statuses.
>> BC_STATUS_SUCCESS will never change, no, but I have an enum so that
>> users could actually look up exit codes and get something useful out
>> of errors. That is also why I have some kb of error messages.
>
> In english. Untranslated. There's an "error messages and internationalization"
> section under http://landley.net/toybox/design.html#portability about that. I'm
> not sure off the top of my head how to adapt that but it stands out as "not how
> toybox does stuff everywhere else".

Yes, I can improve the error messages according to the link you sent.
I will do that.

However, bc is (mostly or completely, not sure) unique among POSIX
commands in that it is designed and *required* to be interactive. It
is "not how toybox does stuff everywhere else" because "everywhere
else" are commands that are not required to be used interactively and
bc is. (See http://pubs.opengroup.org/onlinepubs/9699919799/utilities/bc.html#tag_20_09_03
and http://pubs.opengroup.org/onlinepubs/9699919799/utilities/bc.html#tag_20_09_15
.) Those error messages, even shortened, are not going away. But they
can be shortened.

>> Now for the flip side, I understand that most people don't think of bc
>> as a long-running program, so it might be fine for toybox. (In fact, I
>> thought you would be fine with it not being perfect in that area,
>> which is why I gave the bc to you so soon.) In that case, I could
>> construct my release script and my code to exclude destructors from
>> toybox releases, just for you. And I would probably know best how to
>> do that. But we can talk about how to possibly do that later.
>
> Dealing with arbitrary amounts of input data in a finite amount of working
> memory is a different problem from "I'm closing files and freeing memory right
> before I call exit which would close them and free it anyway, so I'm running
> code that does nothing".

So again, I am okay with CFG_TOYBOX_FREE.

>>> There's snippets like:
>>>
>>>       case BC_INST_PUSH_OBASE:
>>>       {
>>>         result.type = BC_RESULT_OBASE;
>>>
>>> Where I go "why are those two different values?
>>
>> One is an enum, to make the code more readable and because memory
>> doesn't matter so much for it. One is a char, because memory matters,
>> and it's a bytecode instruction. Also, I do not want to use the
>> bytecode instruction as a replacement for the enum for compile time
>> checks.
>
> There are a number of places where two values do basically the same thing. Last
> night I hit BC_STATUS_LEX_EOF and BC_STATUS_PARSE_EOF, for example.

BC_STATUS_PARSE_EOF is legacy (from development). It could be removed,
and I will do that.

>>> Could the code be simplified if
>>> they were the same value?" But again, haven't traced down that path yet. I need
>>> to read and follow the logic first, and at 9000 lines that's an undertaking.
>>
>> To be honest, I was okay with you not understanding the code.
>
> I'm not. I understand all the parts of toybox outside of pending.

Okay. Well, you already understand parsers and lexers I am sure (from
sed). And I think sed might also have a virtual machine. But you will
need to understand an arbitrary math library, albeit a simple one.

>> I was
>> planning on being a part of toybox from here on out, and I was okay
>> with you passing the buck on bc bugs to me. I think that I am pretty
>> quick (besides Sundays) at answering questions and issues, and I
>> intended to, both from GitHub and the mailing list, or from wherever
>> they come. I thought that if I did so, you would not have to worry
>> about the content of the code so much as the functionality.
>
> I'm happy to have somebody else maintaining a large chunk of infrastructure, but
> so far there is no part of toybox that I don't understand and am not allowed to
> touch.
>
>>> So instead of applying a series of cleanup patches, you want me to send them to
>>> you and then pull them back from you?
>>>
>>> *shrug* I can do that.
>
> I screwed up my first attempt at that, and am no longer doing it through git.
>
>>> But what I'd really like is regression tests...
>>
>> I will get you regression tests. I will look up how you want them too.
>> I already have a lot, and I will adapt them for toybox. But as a side
>> note, I did run all of my current regression tests before submitting
>> to you, and I would do that before updating toybox after bug reports
>> or before releases.
>
> I want to be able to answer the question "Did I just break it?"

Fair enough. I did not consider that.

> It sounds like you don't want me touching it at all, but you want it to be in
> toybox.

So I do not think my vision for the bc in toybox has been clear.

I am okay with you touching things. If I really wasn't, I would not
have a staging area where I am implementing your changes as much as I
can.

But once your changes were done, I was going to do my best to
implement them in a script so that you could tell me, "Gavin, I have a
release coming up in a month. Please submit all of the bug fixes that
you have." And then I would just run the script, commit to
https://github.com/gavinhoward/toybox, and submit a PR, which you
would then pull into the official toybox repo.

I did envision a bit of back and forth on changes. However, at this
point, I feel kind of left out. But more on that later.

>> Also, you are welcome to send long emails with questions asking my why
>> I did things a certain way.
>
> Privately, off list, like this?

Again, that email was actually sent to the list as well.

>> As a final note and summary, I do not consider this bc to be "done."
>> What I mean by that is that I am not done working on it. There will
>> still be bugs, there will need to be maintenance, and I would like to
>> do that. Having an automated release would make that possible, so if
>> any changes need to be made, making them in my repo and then cutting
>> another release to toybox would be easiest for me.
>
> I'm aware of this issue. However, no other toybox command has a translation
> layer. It's the kind of thing I optimize away during cleanup.

I think you are correct that there should be no translation layer. I
think what we disagree on is how would be best to do that.

> Here are the 10 largest files in toybox:
>
> -rw-rw-r-- 1 landley  25759 Aug  3  2016 ./kconfig/expr.c
> -rw-r--r-- 1 landley  32754 Mar 12 14:16 ./lib/lib.c
> -rw-rw-r-- 1 landley  33277 Mar  9 18:53 ./toys/posix/sed.c
> -rw-rw-r-- 1 landley  46151 Aug  3  2016 ./toys/pending/dhcp.c
> -rw-rw-r-- 1 landley  46623 Apr 11  2017 ./toys/pending/fdisk.c
> -rw-rw-r-- 1 landley  61524 Mar  8 07:22 ./toys/posix/ps.c
> -rw-rw-r-- 1 landley  69350 Aug  3  2016 ./toys/pending/dhcpd.c
> -rw-rw-r-- 1 landley  83238 May 23  2017 ./toys/pending/xzcat.c
> -rw-rw-r-- 1 landley  91619 Aug  3  2016 ./toys/pending/ip.c
> -rw-r--r-- 1 landley 183838 Mar 12 20:54 ./toys/pending/bc.c
>
> You'll notice that your file is twice the size of the next largest (_after_ 3
> rounds of cleanup), that 6 of the top 10 are still in pending (I.E. external
> contributions not yet cleaned up) and one is from the other external source
> (kconfig, which needs to be thrown out and replaced).
>
> One of the others is lib/lib.c, the dumping ground for common infrastructure
> shared by all commands.
>
> The largest two promoted toybox commands are sed.c (implementing sed) and ps.c
> implementing 5 different commands (which I need to break up so ps, top/iotop,
> and pgrep/pkill can live in 3 different files the common infrastructure moved to
> lib in a 4th file).

Okay, I do not know how large the sed language is, so I have no idea
how to compare these. I still think bc is bigger, but that is naivete.

> I've had... let's call them "opinions"... about external commands stuck into a
> multicall binary without being properly integrated, for some years now:
>
> http://lists.busybox.net/pipermail/busybox/2005-November/051179.html

This is where I feel left out.

Yes, they should be properly integrated (no translation layer, as I
said above). But I feel like you are simply applying the same rules to
bc as you do to other commands. That's understandable, now that I
think about it. After all, the other commands have much in common.
However, as I said above, bc is unique among POSIX commands, so it has
a few things that other commands don't necessarily have to worry
about. I know a lot about those things now (I've worked on this
full-time for a month and a half), but I feel like you are simply
doing things your own way without trying to understand why I did it
certain ways, as well as what the bc spec requires. I feel like my
hard work and expertise gained this past little while is just being
ignored.

> And on a number of occasions, I have historically been able to shrink external
> contributions to half or less of their original size without sacrificing
> functionality:
>
> http://landley.net/toybox/cleanup.html#ifconfig

I don't think you can on this command, but again, that's probably
because I am young and naive compared to you.

Tell you what: if you get this one to half its original size (as
originally submitted), I will maintain it, even by hand.

> So far in 3 passes I've barely nibbled at the edges, and you're telling me to stop.

That's not the message I meant to get across. Sorry about that. If you
would prefer, I can stop bothering you and just let you do your thing.
And I would still get you your regression tests, which I am working
on.

Gavin Howard