[Toybox] [PATCH] bc.c: uchar typedef is pointless, we -funsigned-char

Oliver Webb aquahobbyist at proton.me
Tue Feb 27 21:48:51 PST 2024


On Tuesday, February 27th, 2024 at 23:02, Rob Landley <rob at landley.net> wrote:
> On 2/27/24 21:46, Oliver Webb wrote:
> 
> > On Tuesday, February 27th, 2024 at 19:19, Rob Landley rob at landley.net wrote:
> > 
> > > On 2/27/24 13:46, Oliver Webb via Toybox wrote:
> > > 
> > > > Since we build toybox with -funsigned-char, there is no reason to have a type for unnsigned chars in bc.c,
> > > > since that is the default for all chars. And removing it gets rid of a typedef
> > > 
> > > Sigh, the bc in the tree is "fraught". The guy who submitted it said he was
> > > escaping from a cloud of drama at the time (I didn't care), and then was
> > > surrounded by a fresh cloud of drama after "escaping" and tried to suck me into
> > > it (I declined). He wandered off to busybox and also maintaining his command as
> > > a separate project.
> > > 
> > > According to "wc toys//.c | sort -n" it is the largest command in the entire
> > > toybox tree. Larger than the shell despite not having a dozen subcommands nor
> > > needing to manage terminal control or child processes. It's 5 times the size or
> > > sed or tar, and at least 3 times the size it NEEDS to be. I'd far rather clean
> > > up xzcat or implement awk (each of which is a similar effort ballpark) than
> > > spend months cleaning it up.
> > 
> > Just a idea, but could we clean up the lexxer for bc, rip some of it out and make it it's
> > own separate thing (lib/lex.c?), then base awk off of that?
> 
> 
> Toybox needs a "lex.c" that works like flex (it's a posix command) and a yacc
> that works like bison (because linux screwed up kconfig in 2018).
> 
> But I haven't seen anything salvageable in bc yet, and I've written my own
> parser multiple times:
> 
> https://github.com/landley/toybox/blob/master/toys/pending/sh.c#L2976
> 
> > I have barely looked through
> > the bc.c code and so far have only gotten the really obvious things cleaned up. But since
> > there is a lexxer somewhere in there it is theoretically possible.
> 
> 
> My understanding is lex is just parallel grep: search for these patterns in
> parallel and tell me which one occurred next.
> 
> I tend to use if/else staircases because it's easy to do, cache local, and
> doesn't leave you with the problem of marshalling data between two disconnected
> lists. (Here's a table of string to magic number value, then here's a case
> statement on the magic numbers. The table and the case statement have two halves
> of the same data separated from each other, as we were complaining about earlier
> today...)
> 
> Whereas:
> 
> if (strstart(&x, "potato")) {
> } else if (strstart(&x, "walrus)) {
> } else if (strstart(&x, "nictitating") {
> } ...
> 
> Is all together in one place, reasonably cache local, and mostly checking the
> first character of the string on failure anyway. (No point getting fancy to
> check ~30 strings, the overhead of one system call is probably more work.)
> 
> > I know the syntax differs in
> > a lot of places, but they both share C style functions, control flow statements, logical operators,
> > etc. So they could share at least some common base?
> 
> 
> The first user is not "reuse".
> 
> > When skimming through bc.c, I've found a lot of code for the -w (warn if GNU extension)
> > and -s (no GNU extensions) options. Since they only make sure the program does NOT provide
> > functionality, and enforce stupid things like single letter variable names (because bc was created
> > when someone in the 70s duct taped a lexxer to dc to give it C like syntax instead of rpn. and dc could
> > only store 255 variables as one character "registers")
> 
> 
> A whole lot of work went into this, I agree.
> 
> "Is that what we should be doing" does not appear to have been asked much.
> 
> > I have never seen -w or -s used by anything, and mktimeconst.bc (the bc code in the linux-kernel)
> > uses plently of GNU extensions in it's code.
> 
> 
> https://landley.net/bin/mkroot/0.8.10/linux-patches/0004-Replace-timeconst.bc-with-mktimeconst.c.patch
> 
> I note that my C replacement could be way smaller, I wrote it that way
> intentionally to mirror the (perl) code it was replacing.
> 
> And my FIRST attempt at replacing it was written in shell, it just required 64
> bit math which wasn't 100% guaranteed at the time. (I had to poke Denys to
> update busybox ash to always use long long for $((math)) even on 32 bit.)
> 
> > Would you mind if I stripped the logic of these options out of bc.c? Nothing USES them from my knowledge
> > and that would free up at least a few hundred lines of code (Probably more). And we would lose no
> > functionality from doing so as long as we keep the options there
> 
> 
> I have no issue,

Thanks, I usually wouldn't ask but I'd like to know if either of the patches
I've submitted for bc.c will be merged, before I start doing things like ripping out
typedefs and "is this strictly POSIX compliment?" code. So that nothing conflicts

> but I'm still not convinced bc is a command we should keep?

It is needed to build a (unpatched) linux kernel. Also android uses bc in a few places:
https://cs.android.com/search?q=file:sh$%20%5Cbbc%5Cb%20-file:external%2F(bc%7Ctoybox%7Cmksh)%2F%20case:yes
since floating point shell math or expr math isn't available. 

If any way of doing floating point math existed without bc or python or perl that was well standardized 
and the linux-kernel bureaucrats removed bc from the kernel bc could die and we could ignore it like 
"fort77" or "qdel". But the process of everyone agreeing on ONE solution and implementing it usually 
takes years. And bc has been in the kernel for over a decade at this point

> Rob

-   Oliver Webb <aquahobbyist at proton.me>


More information about the Toybox mailing list