[Toybox] [PATCH] A "dc" implementation

enh enh at google.com
Fri Sep 22 10:37:44 PDT 2023


On Fri, Sep 22, 2023 at 9:15 AM Rob Landley <rob at landley.net> wrote:
>
> On 9/20/23 22:37, Oliver Webb wrote:
> > ------- Original Message -------
> > On Wednesday, September 20th, 2023 at 3:07 PM, Rob Landley <rob at landley.net> wrote:
> >
> >
> >> On 9/20/23 01:41, Oliver Webb via Toybox wrote:
> >>
> >> > I have made a implementation of the 'dc' command in toybox
> >> >
> >> > Since dc isn't specified in POSIX or LSB,
> >>
> >>
> >> Which is why it's not in toybox?
> >>
> >> There are things listed in yocto's root filesystem in the roadmap which I don't
> >> intend to add to toybox.
> >
> > I was unaware of this, The other things that the stuff in the roadmap are from have been curated
> > to remove unwanted or out of scope commands from the roadmap so I thought the yocto project
> > would have been evaluated in a similar way before being put in the roadmap.
>
> My bad, adding yocto to status.html was a bit misleading, but I already had
> klibc, sash, sbase, beastiebox, and tizen in there, all of which are data points
> but not necessarily endorsements.
>
> It's not that I've made a strong decision NOT to add them, it's that they showed
> up in an environment of dubious design judgement, and I haven't seen a consumer
> of them yet. "Under consideration" is probably the best phrasing.
>
> (I've also personally started to distinguish between "1.0" commands and
> "post-1.0" commands like screen and rsync, but don't have a way to annotate that
> in the roadmap yet...)
> > Also, this command is mentioned in the roadmap in a context other then the yocto
> > project. In the evaluation of the LFS packages for which ones we would provide
> > "complete-ish replacements for", both bc and dc are mentioned.
>
> Which is from LFS 6 and version 12 just came out. I'm digging towards an updated
> LFS/BLFS to see what's actually still used. I'd also love to build AOSP with the
> command line logging wrapper, but that at least partially sets up its own $PATH
> internally and is a can of worms to fiddle with.
>
> For any of these build environments that are adding and installing tools,
> there's a design question of "you just built your own sed to replace the
> existing sed: so I stop logging calls to sed now? Do I let the build stop using
> the toybox one and use the replacement it installed, or keep testing the toybox
> one? It's mostly a "start or end of $PATH" question...
>
> > both bc and dc are mentioned.
>
> Back in the 1970s, having a calculator was a big deal. These days, fairly
> competent 64-bit $((math)) is built into the shell. (I remember an exchange with
> Denys (busybox maintainer) about that in the https://lkml.org/lkml/2009/12/8/94
> days fixing up busybox's shell which was the only one _not_ doing 64 bit math on
> 32 bit targets, and yes I'd tested Red Hat 9 from 2003.) Linux developers who
> need arbitrary precision or floating point math have generally used perl or
> python or similar.
>
> Posix's "expr" command still gets used a bit for historical reasons even though
> the need to split arguments is silly to modern audiences (expr 2+3 prints "2+3").
>
> The dc command is reverse polish notation, which I wouldn't think has really
> been taught to new people since the Apollo program in the early 1970s? Yeah some
> old guys still worship 1970s HP calculators for some reason, but cheap
> calculators that could handle infix came along 50 years ago. The TI Cal Tech
> advertised it could do algebraic entry in 1967. By the time I got to school they
> were solar powered and disposable. In middle school I had a calculator
> wristwatch with tiny buttons, and in high school I had a calculator the size of
> a credit card, made to fit in your wallet along with credit cards; none of that
> did Yoda Math). My Devuan Banana install hasn't got dc in the $PATH and nothing
> I've tried to build has ever complained about its absence, which is why I went
> "um, do we really need this?".
>
> The "bc" command was added to the linux kernel build by Peter Anvin during my
> "remove the perl dependencies peter anvin added to the build" patch series in 2013:
>
> https://lkml.iu.edu/hypermail/linux/kernel/1302.3/01520.html
>
> Which I'd been regularly submitting for about 5 years at that point. Peter of
> course did NOT participate in the discussion, he instead pushed a patch directly
> to Linus outside linux-kernel, adding a bc dependency.
>
> https://landley.net/notes-2013.html#28-03-2013
>
> Nothing in either Linux From Scratch or Gentoo's base install had used it
> before, which I know because both had to add it to build the updated kernel (and
> both were "build the root filesystem from source at install time" distros, so
> nothing ELSE used it either).
>
> I got a bc submission to toybox in a cloud of drama, where the guy submitting
> code to me insisted that someone else had stolen his code but he'd gotten away
> from the old contributors and had a fresh project now... Which immediately
> turned into fresh drama and squabbling they kept cc-ing me on for MONTHS:
>
> http://lists.landley.net/pipermail/toybox-landley.net/2018-March/017454.html
>
> http://lists.landley.net/pipermail/toybox-landley.net/2018-August/017677.html
>
> Busybox did NOT have a bc implementation at the time, the same implementation
> was later submitted to busybox. If you check out the busybox source and go:
>
>   find . -name '*.c' | xargs wc | sort -n
>
> It's the third largest file in the whole of busybox, and almost twice the size
> of "awk.c" which is a programming language. If you do the same thing to toybox,
> it is THE largest C file in toybox (over a thousand lines larger than toysh).
>
> I did some cleanup passes on the bc contributed to toybox, removing a few
> hundred lines at a time (some were merged "upstream" by the person drama
> inexplicably follows for no reason, and marshalled into the busybox one and the
> standalone one), but... honestly, it would be faster to write a new one from
> scratch. It has no business being bigger than awk. But if it wasn't a kernel
> dependency nobody would use it today, so the real question is should it just be
> removed? And proving we _don't_ need it is "build LFS and BLFS" territory...

use of bc because $(()) isn't/wasn't available and expr sucks is
pretty entrenched:

https://cs.android.com/search?q=file:sh$%20%5Cbbc%5Cb%20-file:external%2F(bc%7Ctoybox%7Cmksh)%2F%20case:yes&sq=

(most of those are in third-party open source things, not "actual"
Android code.)

whereas only one jemalloc test script seems to use dc:

https://source.corp.google.com/search?q=file:sh$%20%5B%5E-%5D%5Cbdc%5Cb%20-file:external%2Fbc%2F%20case:yes&sq=repo:android%2Fplatform%2Fsuperproject%2Fmain%20b:main%20-file:prebuilts%2Fvndk%2F%20-file:prebuilts%2Fruntime%2F

> So we have $((math)), we have expr in pending, we have an overdramatic bc in
> pending, and that was the context in which I received your dc submission on top
> of those.
>
> > Which leads me into another question, what is your opinion about m4, make,
> > yacc, and lex being in toybox?
>
> We need all 4 of them to build a current base Linux OS. I used to punt on lex
> and yacc (you could build the gnu versions under an existing chroot without
> anything in the minimal native development environment depending on them, so
> they were leaf nodes not circular core dependencies), but the linux-kernel loons
> rewrote kconfig to require them in 2018. (Probably Peter Anvin again, he hates
> dependency minimization as a concept, still dunno why.)
>
> The "make" should be gmake and thus big (gotta run kbuild in linux). I'm not
> sure how much of the yacc->bison and lex->flex exensions get used because I
> never program with either and haven't dug that far into them yet. I _think_ m4
> is mostly posix? (It's basically macro expansion, an alternative to the C
> preprocessor that claims to be better in a way I haven't interrogated yet.) M4
> would probably be low hanging fruit if I'd ever used it before.

(from a quick glance at m4 in the AOSP context a few years back, the
trouble with m4 isn't m4 --- it's that every user of m4 in 2023 is
assuming the huge gnu m4 macro library.)

> > In every evaluation of the POSIX commands in the talks I have watched and
> > documentation I have read, those commands are omitted on reason of being
> > exclusively used to build packages, and therefore should be part of the
> > compiler tool-chain.
>
> I felt that way myself once, but that's because I was maintaining a tinycc fork
> which I was teaching to act as a multiplexer (so it could be cc, ld, as, strip,
> etc) and adding "make" and friends to it wouldn't have been a very heavy lift.
> Alas, that project's pretty thoroughly parked these days.
>
> Backstory: many moons ago I maintained a fork of tinycc:
>
> https://landley.net/code/tinycc
> https://landley.net/hg/tinycc
>
> Which I was trying to A) extend to full C99 (including the vararray nonsense and
> alloca() and so on), because I wanted it to build a vanilla unmodified linux
> kernel and busybox and uclibc and itself to create my minimal native development
> environment out of just 4 packages. And then build a bootable LFS under the result.
>
> https://landley.net/code/qcc
> https://elinux.org/CELF_Project_Proposal/Combine_tcg_with_tcc
> https://landley.net/code/tinycc/qcc/todo.txt
>
> I got started because tinycc had built "tccboot" back in 2004, but that was a
> modified kernel (not vanilla, ripped out a list of constructs tinycc didn't
> understand yet) and I also vaguely recall that only compiled the _kernel_ on
> boot and used an initramfs image with an existing busybox binary?
>
> https://bellard.org/tcc/tccboot.html
> https://bellard.org/tcc/tccboot_readme.html
>
> Fabrice Bellard created tinycc as a joke in 2002 (an entry in the obfuscated C
> code contest, which had something like a 4k size limit on the source code you
> could submit: he submitted a C compiler that recompiled itself within the size
> limit, and won "best abuse of the rules"). He then de-obfuscated it and extended
> it to _almost_ a full C99 compiler over the next couple years, and even started
> giving it multiple "backends" so it could generate code for multiple processors.
>
> And then he started thinking about multiple _frontends_, specifically what if
> instead of parsing C it parsed x86 machine code, and translated it to slightly
> DIFFERENT machine code, and this turned into QEMU, which led to the concept of a
> "project tumor" that buds off and sucks all your developers away, which I think
> I ranted about here:
>
> http://landley.net/aboriginal/history.html
>
> Or maybe here:
>
> http://lists.busybox.net/pipermail/buildroot/2016-December/180102.html
>
> Anyway, QEMU ate all fabrice's time and tinycc stagnated for a while and I
> proposed picking it up except the source control was in CVS and Fabrice's one
> iron rule (at the time) is that the source control STAY in CVS so I did a fork
> instead, and Fabrice eventually handed tinycc off to a Windows developer named
> Grishka who only cared about using it as a free windows compiler and really
> didn't care about Linux, and the REAL annoyance is ever few months we'd repeat a
> cycle where I'd do a bunch of work in my fork, and Grishka would wake up weeks
> later and copy about half of it (leaving the half he didn't understand behind),
> and put out a new tinycc release, and Linux Weekly news and friends would cover
> the "real" tinycc and ignore mine, and I'd stop work so grishka's tinycc would
> go stagnant again, and when I thought it was dead enough I'd start up work on my
> fork again... rinse repeat.
>
> (The sad part was seeing the tinycc developers YEARS LATER trying to solve a
> problem I'd already solved which they'd left because they only now noticed the
> issue existed. That happened multiple times...)
>
> Anyway, I've since come to the conclusion that writing a C compiler from scratch
> (with basically no optimizer, tinycc wasn't even preserving registers between
> operations, it was "load from stack, perform operation, write to stack" even
> when two consecutive operations used the same slot, that's why the resulting
> executable was 1/3 the speed of other compilers) isn't all that hard, especially
> if I ever found time to read:
>
> https://norasandler.com/2017/11/29/Write-a-Compiler.html
>
> Which is a book now:
>
> https://nostarch.com/writing-c-compiler
>
> A lot of stuff like strace and readelf were originally in the "belongs in qcc"
> bucket. These days, "cc/as/ld" are probably in the toybox post-1.0 command list
> bucket:
>
> http://lists.landley.net/pipermail/toybox-landley.net/2020-July/011898.html
>
> > I mostly agree with this, What confuses me is that these commands are some of
> > the fist items in a list of "Packages toybox plans to provide complete-ish replacements for" on the roadmap.
>
> Commit f4c9a32a1116 from 2020. "Plans" is a strong word there, "could" is
> probably better. The bc one was because we had bc, which smelled a lot fresher
> in 2018 and it does in 2023, and it seemed easier to clean up a 7000 line pile
> of spaghetti than trying to get a small existing working patch into linux-kernel
> through a community that was doing:
>
> https://github.com/torvalds/linux/commit/ddbd2b7ad99a
>
> https://github.com/torvalds/linux/commit/8a104f8b5867
>
> https://www.cnbc.com/2018/09/17/linux-creator-linus-torvalds-takes-time-off-apologizes-for-behavior.html
>
> https://en.wikipedia.org/wiki/Identified_patient
>
> > But when looking at the status page, none of the commands have been mentioned as things we actually want,
> > I am 99.999% sure these are commands we don't want to be in the project, but I thought it would be better to
> > ask anyways because those commands were put in that list for a reason.
>
> Keep in mind what I want and what "the community" wants has an air gap. I never
> wanted selinux support, but here we are. If you come to me with a compelling
> case to add dc, I can add dc. If it turns out
> https://landley.net/bin/mkroot/0.8.9/linux-patches/0004-Replace-timeconst.bc-with-mktimeconst.c.patch
> is not a sufficient strategy for handling bc (I.E. it actually has a nontrivial
> second user), I can clean it up an promote it. (Or write a new one if that's
> easier.)
>
>
> >> Do you have an actual user for dc?
> >
> > When I picked this command to write, I was going off the assumption that,
> > because it was the roadmap, Somebody cared enough about it to put it there.
> > And therefore would be the "user" who would benefit by it being implemented.
>
> Alas, no. The logic was "bc got inflicted on linux-kernel in 2013, bc got
> submitted to toybox and is in pending albeit promoting it would be nontrivial,
> the patch to remove bc from the kernel again is small and simple but political
> and removing the FIRST round of this nonsense took 5 years of persistence... if
> we DO replace bc, the package that provides it to LFS and friends also
> includes/provides dc, which is posix..."
>
> 95% likely yocto has dc because it installed the gnu bc package which also
> includes dc.
>
> Sigh. Busybox has a dc, doesn't it? Yup. Sigh. If it was in posix I'd probably
> just add it even if it's useless...
>
> > I could go on about how it has square roots when expr and shell $((math)) doesn't,
> > or how the RPN stack layout of it _could_ be used to parse the numerical output
> > of commands easier when combined with external scripts.
>
> $ echo $(($(seq 1 30 | xargs | tr ' ' +)))
> 465
> $ echo $(($(printf '(%s+%s-%s)/%s' $(seq 4 -1 1))))
> 5
>
> > But those are justifications,
> > not actual use cases. I'd be lying if I said it wasn't mainly a "It's in the roadmap and busybox" thing.
>
> I still don't understand why bash $((math)) doesn't do floating point. You'd
> think it would. Did I already ask Chet about this? (He'd almost certainly say
> "for historical reasons" and be right.) I also don't understand why expr doesn't
> do floating point, that would give it a reason to exist outside of integer
> $((math)), but again... historical.
>
> "This handles floating point" is a strong positive for dc. "This uses reverse
> polish notation" seems like it would eliminates it for anyone except electrical
> engineers of a certain age, or people who have implemented their own infix
> calculation engine, but maybe I'm wrong? (Do the younger generations have more
> enthusiasm for reverse polish notation than I've been led to believe?)
>
> Heck, a year and change ago I got an rpn calculator from a hardware engineer,
> which I shoved into a copy of hello.c so it's vaguely toybox command shaped.
> (Attached.) But I didn't go farther because... reverse polish notation. In 2023.
> He didn't try to make dc, he'd just whipped up a quick command line tool for his
> own use.
>
> I note that toybox still supports the legacy $[math] bash syntax that got
> replaced with $((math)) when posix caught up with the group, and I've been
> vaguely tempted to try to make $[math] use floating point. I should ask Chet how
> terrible an idea that is...
>
> Rob_______________________________________________
> Toybox mailing list
> Toybox at lists.landley.net
> http://lists.landley.net/listinfo.cgi/toybox-landley.net


More information about the Toybox mailing list