[Toybox] [PATCH] A "dc" implementation

Rob Landley rob at landley.net
Fri Sep 22 09:18:30 PDT 2023


On 9/20/23 22:37, Oliver Webb wrote:
> ------- Original Message -------
> On Wednesday, September 20th, 2023 at 3:07 PM, Rob Landley <rob at landley.net> wrote:
> 
> 
>> On 9/20/23 01:41, Oliver Webb via Toybox wrote:
>>
>> > I have made a implementation of the 'dc' command in toybox
>> >
>> > Since dc isn't specified in POSIX or LSB,
>>
>>
>> Which is why it's not in toybox?
>>
>> There are things listed in yocto's root filesystem in the roadmap which I don't
>> intend to add to toybox.
> 
> I was unaware of this, The other things that the stuff in the roadmap are from have been curated
> to remove unwanted or out of scope commands from the roadmap so I thought the yocto project
> would have been evaluated in a similar way before being put in the roadmap.

My bad, adding yocto to status.html was a bit misleading, but I already had
klibc, sash, sbase, beastiebox, and tizen in there, all of which are data points
but not necessarily endorsements.

It's not that I've made a strong decision NOT to add them, it's that they showed
up in an environment of dubious design judgement, and I haven't seen a consumer
of them yet. "Under consideration" is probably the best phrasing.

(I've also personally started to distinguish between "1.0" commands and
"post-1.0" commands like screen and rsync, but don't have a way to annotate that
in the roadmap yet...)
> Also, this command is mentioned in the roadmap in a context other then the yocto
> project. In the evaluation of the LFS packages for which ones we would provide
> "complete-ish replacements for", both bc and dc are mentioned.

Which is from LFS 6 and version 12 just came out. I'm digging towards an updated
LFS/BLFS to see what's actually still used. I'd also love to build AOSP with the
command line logging wrapper, but that at least partially sets up its own $PATH
internally and is a can of worms to fiddle with.

For any of these build environments that are adding and installing tools,
there's a design question of "you just built your own sed to replace the
existing sed: so I stop logging calls to sed now? Do I let the build stop using
the toybox one and use the replacement it installed, or keep testing the toybox
one? It's mostly a "start or end of $PATH" question...

> both bc and dc are mentioned.

Back in the 1970s, having a calculator was a big deal. These days, fairly
competent 64-bit $((math)) is built into the shell. (I remember an exchange with
Denys (busybox maintainer) about that in the https://lkml.org/lkml/2009/12/8/94
days fixing up busybox's shell which was the only one _not_ doing 64 bit math on
32 bit targets, and yes I'd tested Red Hat 9 from 2003.) Linux developers who
need arbitrary precision or floating point math have generally used perl or
python or similar.

Posix's "expr" command still gets used a bit for historical reasons even though
the need to split arguments is silly to modern audiences (expr 2+3 prints "2+3").

The dc command is reverse polish notation, which I wouldn't think has really
been taught to new people since the Apollo program in the early 1970s? Yeah some
old guys still worship 1970s HP calculators for some reason, but cheap
calculators that could handle infix came along 50 years ago. The TI Cal Tech
advertised it could do algebraic entry in 1967. By the time I got to school they
were solar powered and disposable. In middle school I had a calculator
wristwatch with tiny buttons, and in high school I had a calculator the size of
a credit card, made to fit in your wallet along with credit cards; none of that
did Yoda Math). My Devuan Banana install hasn't got dc in the $PATH and nothing
I've tried to build has ever complained about its absence, which is why I went
"um, do we really need this?".

The "bc" command was added to the linux kernel build by Peter Anvin during my
"remove the perl dependencies peter anvin added to the build" patch series in 2013:

https://lkml.iu.edu/hypermail/linux/kernel/1302.3/01520.html

Which I'd been regularly submitting for about 5 years at that point. Peter of
course did NOT participate in the discussion, he instead pushed a patch directly
to Linus outside linux-kernel, adding a bc dependency.

https://landley.net/notes-2013.html#28-03-2013

Nothing in either Linux From Scratch or Gentoo's base install had used it
before, which I know because both had to add it to build the updated kernel (and
both were "build the root filesystem from source at install time" distros, so
nothing ELSE used it either).

I got a bc submission to toybox in a cloud of drama, where the guy submitting
code to me insisted that someone else had stolen his code but he'd gotten away
from the old contributors and had a fresh project now... Which immediately
turned into fresh drama and squabbling they kept cc-ing me on for MONTHS:

http://lists.landley.net/pipermail/toybox-landley.net/2018-March/017454.html

http://lists.landley.net/pipermail/toybox-landley.net/2018-August/017677.html

Busybox did NOT have a bc implementation at the time, the same implementation
was later submitted to busybox. If you check out the busybox source and go:

  find . -name '*.c' | xargs wc | sort -n

It's the third largest file in the whole of busybox, and almost twice the size
of "awk.c" which is a programming language. If you do the same thing to toybox,
it is THE largest C file in toybox (over a thousand lines larger than toysh).

I did some cleanup passes on the bc contributed to toybox, removing a few
hundred lines at a time (some were merged "upstream" by the person drama
inexplicably follows for no reason, and marshalled into the busybox one and the
standalone one), but... honestly, it would be faster to write a new one from
scratch. It has no business being bigger than awk. But if it wasn't a kernel
dependency nobody would use it today, so the real question is should it just be
removed? And proving we _don't_ need it is "build LFS and BLFS" territory...

So we have $((math)), we have expr in pending, we have an overdramatic bc in
pending, and that was the context in which I received your dc submission on top
of those.

> Which leads me into another question, what is your opinion about m4, make,
> yacc, and lex being in toybox?

We need all 4 of them to build a current base Linux OS. I used to punt on lex
and yacc (you could build the gnu versions under an existing chroot without
anything in the minimal native development environment depending on them, so
they were leaf nodes not circular core dependencies), but the linux-kernel loons
rewrote kconfig to require them in 2018. (Probably Peter Anvin again, he hates
dependency minimization as a concept, still dunno why.)

The "make" should be gmake and thus big (gotta run kbuild in linux). I'm not
sure how much of the yacc->bison and lex->flex exensions get used because I
never program with either and haven't dug that far into them yet. I _think_ m4
is mostly posix? (It's basically macro expansion, an alternative to the C
preprocessor that claims to be better in a way I haven't interrogated yet.) M4
would probably be low hanging fruit if I'd ever used it before.

> In every evaluation of the POSIX commands in the talks I have watched and
> documentation I have read, those commands are omitted on reason of being
> exclusively used to build packages, and therefore should be part of the
> compiler tool-chain.

I felt that way myself once, but that's because I was maintaining a tinycc fork
which I was teaching to act as a multiplexer (so it could be cc, ld, as, strip,
etc) and adding "make" and friends to it wouldn't have been a very heavy lift.
Alas, that project's pretty thoroughly parked these days.

Backstory: many moons ago I maintained a fork of tinycc:

https://landley.net/code/tinycc
https://landley.net/hg/tinycc

Which I was trying to A) extend to full C99 (including the vararray nonsense and
alloca() and so on), because I wanted it to build a vanilla unmodified linux
kernel and busybox and uclibc and itself to create my minimal native development
environment out of just 4 packages. And then build a bootable LFS under the result.

https://landley.net/code/qcc
https://elinux.org/CELF_Project_Proposal/Combine_tcg_with_tcc
https://landley.net/code/tinycc/qcc/todo.txt

I got started because tinycc had built "tccboot" back in 2004, but that was a
modified kernel (not vanilla, ripped out a list of constructs tinycc didn't
understand yet) and I also vaguely recall that only compiled the _kernel_ on
boot and used an initramfs image with an existing busybox binary?

https://bellard.org/tcc/tccboot.html
https://bellard.org/tcc/tccboot_readme.html

Fabrice Bellard created tinycc as a joke in 2002 (an entry in the obfuscated C
code contest, which had something like a 4k size limit on the source code you
could submit: he submitted a C compiler that recompiled itself within the size
limit, and won "best abuse of the rules"). He then de-obfuscated it and extended
it to _almost_ a full C99 compiler over the next couple years, and even started
giving it multiple "backends" so it could generate code for multiple processors.

And then he started thinking about multiple _frontends_, specifically what if
instead of parsing C it parsed x86 machine code, and translated it to slightly
DIFFERENT machine code, and this turned into QEMU, which led to the concept of a
"project tumor" that buds off and sucks all your developers away, which I think
I ranted about here:

http://landley.net/aboriginal/history.html

Or maybe here:

http://lists.busybox.net/pipermail/buildroot/2016-December/180102.html

Anyway, QEMU ate all fabrice's time and tinycc stagnated for a while and I
proposed picking it up except the source control was in CVS and Fabrice's one
iron rule (at the time) is that the source control STAY in CVS so I did a fork
instead, and Fabrice eventually handed tinycc off to a Windows developer named
Grishka who only cared about using it as a free windows compiler and really
didn't care about Linux, and the REAL annoyance is ever few months we'd repeat a
cycle where I'd do a bunch of work in my fork, and Grishka would wake up weeks
later and copy about half of it (leaving the half he didn't understand behind),
and put out a new tinycc release, and Linux Weekly news and friends would cover
the "real" tinycc and ignore mine, and I'd stop work so grishka's tinycc would
go stagnant again, and when I thought it was dead enough I'd start up work on my
fork again... rinse repeat.

(The sad part was seeing the tinycc developers YEARS LATER trying to solve a
problem I'd already solved which they'd left because they only now noticed the
issue existed. That happened multiple times...)

Anyway, I've since come to the conclusion that writing a C compiler from scratch
(with basically no optimizer, tinycc wasn't even preserving registers between
operations, it was "load from stack, perform operation, write to stack" even
when two consecutive operations used the same slot, that's why the resulting
executable was 1/3 the speed of other compilers) isn't all that hard, especially
if I ever found time to read:

https://norasandler.com/2017/11/29/Write-a-Compiler.html

Which is a book now:

https://nostarch.com/writing-c-compiler

A lot of stuff like strace and readelf were originally in the "belongs in qcc"
bucket. These days, "cc/as/ld" are probably in the toybox post-1.0 command list
bucket:

http://lists.landley.net/pipermail/toybox-landley.net/2020-July/011898.html

> I mostly agree with this, What confuses me is that these commands are some of
> the fist items in a list of "Packages toybox plans to provide complete-ish replacements for" on the roadmap.

Commit f4c9a32a1116 from 2020. "Plans" is a strong word there, "could" is
probably better. The bc one was because we had bc, which smelled a lot fresher
in 2018 and it does in 2023, and it seemed easier to clean up a 7000 line pile
of spaghetti than trying to get a small existing working patch into linux-kernel
through a community that was doing:

https://github.com/torvalds/linux/commit/ddbd2b7ad99a

https://github.com/torvalds/linux/commit/8a104f8b5867

https://www.cnbc.com/2018/09/17/linux-creator-linus-torvalds-takes-time-off-apologizes-for-behavior.html

https://en.wikipedia.org/wiki/Identified_patient

> But when looking at the status page, none of the commands have been mentioned as things we actually want,
> I am 99.999% sure these are commands we don't want to be in the project, but I thought it would be better to
> ask anyways because those commands were put in that list for a reason.

Keep in mind what I want and what "the community" wants has an air gap. I never
wanted selinux support, but here we are. If you come to me with a compelling
case to add dc, I can add dc. If it turns out
https://landley.net/bin/mkroot/0.8.9/linux-patches/0004-Replace-timeconst.bc-with-mktimeconst.c.patch
is not a sufficient strategy for handling bc (I.E. it actually has a nontrivial
second user), I can clean it up an promote it. (Or write a new one if that's
easier.)


>> Do you have an actual user for dc?
> 
> When I picked this command to write, I was going off the assumption that,
> because it was the roadmap, Somebody cared enough about it to put it there.
> And therefore would be the "user" who would benefit by it being implemented.

Alas, no. The logic was "bc got inflicted on linux-kernel in 2013, bc got
submitted to toybox and is in pending albeit promoting it would be nontrivial,
the patch to remove bc from the kernel again is small and simple but political
and removing the FIRST round of this nonsense took 5 years of persistence... if
we DO replace bc, the package that provides it to LFS and friends also
includes/provides dc, which is posix..."

95% likely yocto has dc because it installed the gnu bc package which also
includes dc.

Sigh. Busybox has a dc, doesn't it? Yup. Sigh. If it was in posix I'd probably
just add it even if it's useless...

> I could go on about how it has square roots when expr and shell $((math)) doesn't, 
> or how the RPN stack layout of it _could_ be used to parse the numerical output
> of commands easier when combined with external scripts.

$ echo $(($(seq 1 30 | xargs | tr ' ' +)))
465
$ echo $(($(printf '(%s+%s-%s)/%s' $(seq 4 -1 1))))
5

> But those are justifications,
> not actual use cases. I'd be lying if I said it wasn't mainly a "It's in the roadmap and busybox" thing.

I still don't understand why bash $((math)) doesn't do floating point. You'd
think it would. Did I already ask Chet about this? (He'd almost certainly say
"for historical reasons" and be right.) I also don't understand why expr doesn't
do floating point, that would give it a reason to exist outside of integer
$((math)), but again... historical.

"This handles floating point" is a strong positive for dc. "This uses reverse
polish notation" seems like it would eliminates it for anyone except electrical
engineers of a certain age, or people who have implemented their own infix
calculation engine, but maybe I'm wrong? (Do the younger generations have more
enthusiasm for reverse polish notation than I've been led to believe?)

Heck, a year and change ago I got an rpn calculator from a hardware engineer,
which I shoved into a copy of hello.c so it's vaguely toybox command shaped.
(Attached.) But I didn't go farther because... reverse polish notation. In 2023.
He didn't try to make dc, he'd just whipped up a quick command line tool for his
own use.

I note that toybox still supports the legacy $[math] bash syntax that got
replaced with $((math)) when posix caught up with the group, and I've been
vaguely tempted to try to make $[math] use floating point. I should ask Chet how
terrible an idea that is...

Rob
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rpn.c
Type: text/x-csrc
Size: 4410 bytes
Desc: not available
URL: <http://lists.landley.net/pipermail/toybox-landley.net/attachments/20230922/fff41fc7/attachment.c>


More information about the Toybox mailing list