[Toybox] [PATCH] bc.c: deobfuscation of bc_lib

Oliver Webb aquahobbyist at proton.me
Tue Feb 27 14:18:17 PST 2024


On Tuesday, February 27th, 2024 at 15:22, Mouse <mouse at Rodents-Montreal.ORG> wrote:
> > > > And also, bc_program_stdin_name never changes from "<stdin>" so
> > > > there's no reason to make it a variable
> > > > Well, I [...] would say that there is at least potentially reason to
> > > > leave it as a variable even if it's never changed: namely,
> > > > future-proofing. If someone wants to change it in the future, it's
> > > > significantly easier to change one variable initialization than to
> > > > chase down everywhere that string is hardwired and check that each
> > > > one actually is referring to the same thing.
> > > > While code flexibility is a good thing. I have to wonder what would
> > > > prompt the change of bc_program_stdin_name.
> 
> 
> There may not be any. I haven't dug enough to understand what the
> variable actually means in context, which is one reason I didn't put it
> any more strongly than "at least potentially reason to".
> 
> I certainly have written plenty of programs that read from stdin and
> then, later, had occasion to change them to read from other files. The
> variable name looks as though it might (emphais: might) be a case of
> something like that. Based on what you've said, I consider that less
> likely now.

I've just started looking at this, and right now I'm stripping out the really
obvious stuff (i.e. massive block of extremely obfuscated code represented by array of numbers).
Looking at the code, the name "<stdin>" is used in error printing, TT.name is set to it sometimes, which also is used 
in error printing. We already have a variable (Seemingly several) that holds the filename bc is reading.
And bc_program_stdin_name is just holding a placeholder if that filename is stdin

> > The only reason I could think of was internationalization, we already
> > have a ton of English text in bc.c in the form of help text, keyword
> > names, error messages, etc.
> 
> 
> Which isn't necessarily a reason to add one more - though I suspect
> <stdin> is less likely to be changed for i18n than more verbose things
>
> like help text.
> 
> > > (It's easy to find all strings that say <stdin>. It's less easy to
> > > verify that each of those is actually talking about the semantic
> > > bc_program_stdin_name refers to.)
> > > The only time the string "<stdin>" appears in bc.c is the declaration
> > > for bc_program_stdin_name.
> 
> 
> At the moment, sure.
> 
> Will that still be true in a month? In a year? In a decade? I doubt
> anyone has more than speculation on those questions.

This code has been largely unchanged since 2018, (When it was first added)
It is fully GNU compatible. Right now I'm seeing what I can do to strip 
code out while keeping features, Not adding more stuff, unless the error handling or file processing
gets a rewrite, the string "<stdin>" will keep being used in the same spots and nowhere
else.

> > > If it were performance-critical enough that the cost of
> > > dereferencing a variable were significant (and, for that small a
> > > difference, if it were under my umbrella I'd want to see
> > > measurements), then it might be worth doing.
> 
> > The compiler does liveness analysis and puts the variable in section
> > .rodata, Although I can't confirm that since I haven't looked at the
> > symbol table.
> 
> 
> Then, I'm curious: what is your basis for saying it does so if you
> haven't found it doing so?

A variable that is never modified should be picked up by the compilers liveness analysis
and made a constant for safety reasons, I'm not guaranteeing you that is what it does, but
it is definitely what _should_ happen in a reasonable compiler. Unfortunately, the C compilers
written in C++ (gcc, clang) stopped being reasonable a while ago, to the point where things
defined with const aren't put in .rodata sometimes.

> > ""Hiding numbers that are used just once into defines to put them out
> > of sight does not really help readability." - Pavel Machek"
> 
> 
> Sort of.
> 
> I would say there is significant value in
> 
> #define MIN_INTEREST 10
> ...
> if (nfound < MIN_INTEREST) return;
> 
> over
> 
> if (nfound < 10) return;
> 
> even if that is the only place MIN_INTEREST is used. (At least,
> assuming the Eliza-effect meanings of "MIN_INTEREST" and "nfound" are
> actually appopriate to the problem domain.)
> 
> Of course, the difference is far less if the code is
> 
> // 10 = min interesting count
> if (nfound < 10) return;
> 
> but, even then, there is significantly more risk for the code and the
> comment to get out of sync than with the #define.
> 
> There is also value in having the #define of MIN_INTEREST grouped
> together with a bunch of other relevant #defines instead of them being
> magic numbers scattered all over, even if each one is used only once
> and the uses have explanatory comments.

Like I've said, if there is a need for it to be a preprocessor macro I'm not against
the idea of making it one, But commands like file don't make small bits of text preprocessor
macros, and the values in file.c are _much_ more likely to be changed for compatibility then
things like the error prompt that only is shown in a interactive session of bc

> /~\ The ASCII Mouse
> \ / Ribbon Campaign
> X Against HTML mouse at rodents-montreal.org
> / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B

-   Oliver Webb <aquahobbyist at proton.me>


More information about the Toybox mailing list