[Toybox] [PATCH] bc.c: uchar typedef is pointless, we -funsigned-char

Rob Landley rob at landley.net
Tue Feb 27 21:10:19 PST 2024


On 2/27/24 21:46, Oliver Webb wrote:
> On Tuesday, February 27th, 2024 at 19:19, Rob Landley <rob at landley.net> wrote:
>> On 2/27/24 13:46, Oliver Webb via Toybox wrote:
>>
>> > Since we build toybox with -funsigned-char, there is no reason to have a type for unnsigned chars in bc.c,
>> > since that is the default for all chars. And removing it gets rid of a typedef
>>
>>
>> Sigh, the bc in the tree is "fraught". The guy who submitted it said he was
>> escaping from a cloud of drama at the time (I didn't care), and then was
>> surrounded by a fresh cloud of drama after "escaping" and tried to suck me into
>> it (I declined). He wandered off to busybox and also maintaining his command as
>> a separate project.
>>
>> According to "wc toys//.c | sort -n" it is the largest command in the entire
>> toybox tree. Larger than the shell despite not having a dozen subcommands nor
>> needing to manage terminal control or child processes. It's 5 times the size or
>> sed or tar, and at least 3 times the size it NEEDS to be. I'd far rather clean
>> up xzcat or implement awk (each of which is a similar effort ballpark) than
>> spend months cleaning it up.
> 
> Just a idea, but could we clean up the lexxer for bc, rip some of it out and make it it's
> own separate thing (lib/lex.c?), then base awk off of that?

Toybox needs a "lex.c" that works like flex (it's a posix command) and a yacc
that works like bison (because linux screwed up kconfig in 2018).

But I haven't seen anything salvageable in bc yet, and I've written my own
parser multiple times:

https://github.com/landley/toybox/blob/master/toys/pending/sh.c#L2976

> I have barely looked through
> the bc.c code and so far have only gotten the _really_ obvious things cleaned up. But since
> there is a lexxer somewhere in there it is theoretically possible.

My understanding is lex is just parallel grep: search for these patterns in
parallel and tell me which one occurred next.

I tend to use if/else staircases because it's easy to do, cache local, and
doesn't leave you with the problem of marshalling data between two disconnected
lists. (Here's a table of string to magic number value, then here's a case
statement on the magic numbers. The table and the case statement have two halves
of the same data separated from each other, as we were complaining about earlier
today...)

Whereas:

  if (strstart(&x, "potato")) {
  } else if (strstart(&x, "walrus)) {
  } else if (strstart(&x, "nictitating") {
  } ...

Is all together in one place, reasonably cache local, and mostly checking the
first character of the string on failure anyway. (No point getting fancy to
check ~30 strings, the overhead of one system call is probably more work.)

> I know the syntax differs in
> a _lot_ of places, but they both share C style functions, control flow statements, logical operators,
> etc. So they could share at least some common base?

The first user is not "reuse".

> When skimming through bc.c, I've found a lot of code for the -w (warn if GNU extension)
> and -s (no GNU extensions) options. Since they only make sure the program does NOT provide
> functionality, and enforce stupid things like single letter variable names (because bc was created
> when someone in the 70s duct taped a lexxer to dc to give it C like syntax instead of rpn. and dc could
> only store 255 variables as one character "registers")

A whole lot of work went into this, I agree.

"Is that what we should be doing" does not appear to have been asked much.

> I have never seen -w or -s used by anything, and mktimeconst.bc (the bc code in the linux-kernel)
> uses plently of GNU extensions in it's code.

https://landley.net/bin/mkroot/0.8.10/linux-patches/0004-Replace-timeconst.bc-with-mktimeconst.c.patch

I note that my C replacement could be way smaller, I wrote it that way
intentionally to mirror the (perl) code it was replacing.

And my FIRST attempt at replacing it was written in shell, it just required 64
bit math which wasn't 100% guaranteed at the time. (I had to poke Denys to
update busybox ash to always use long long for $((math)) even on 32 bit.)

> Would you mind if I stripped the logic of these options out of bc.c? Nothing USES them from my knowledge
> and that would free up at least a few hundred lines of code (Probably more). And we would lose no
> functionality from doing so as long as we keep the options there 

I have no issue, but I'm still not convinced bc is a command we should keep?

Rob


More information about the Toybox mailing list