[Toybox] [PATCH] toysh: fix -Wuse-after-free

Oliver Webb aquahobbyist at proton.me
Wed Mar 13 16:19:28 PDT 2024


> On 3/8/24 21:22, Oliver Webb via Toybox wrote:

> > Reading through sh.c, most of the variable names are 2 letters long (repeating the same letter),
> 
> I switched from single character local variables to double character ones
> because they're easier to search for without an elaborate IDE trying to parse
> the code.

/\<word\> searches for specific words in vim

> If you want to complain, I do tend to have "s", "ss", and "sss" as an

I did complain about that in the email. 

> would otherwise be just "string". 

That be like naming _every_ integer "integer", (No wait, strings are _worse_ because they can hold
a arbitrary string of anything)

> > When a name
> > like "in_source", "inputsrc", or just "inname" would tell you that without having to look
> > over the code and make sure that's the case.
> 
> No, you still have to look at the code. Especially when the code is not
> finished, as evidenced by it still being in pending/ and not passing half its
> test suite. (Let alone my giant pile of "turn these into proper shell tests",
> attached...)

Not if you are skimming over a section, that is mostly unimportant but interacts with a certain thing
you are looking for bugs in.

> And "c" holding the argument of "-c" is still probably a bad example to base
> your rant on.

I couldn't get far enough down from sh_main() to see any better examples, sorry.

> And a command that's flagrantly not finished yet may also be a bad example...

This is the only one I've seen which does stuff like this. No other command in pending
does stuff like this (At least not one I've looked at).

> > "ff" is a common name for file descriptors.
> 
> In this idiom it's "f, but more searchable".
> 
> You're basically complaining there was an "i" variable that wasn't a loop index.

That IS a problem, and when breaking/mixing the few conventions there are happens _constantly_,
with local variables that interact with each other. That becomes a much larger problem.
"i" is usually a counter used for at most a few dozen lines, "ss" and friends span regularly for
hundreds of lines.

> In multiple places, f stands for function, and there's more than one aspect to
> functions.

I'm sure to someone who wrote 99% of it everything makes so much more sense. Just like
the people who maintain GCC and the coreutils and IOCCC entries. Unfortunately, I never
got the chance to look at it when it was somewhat auditable. So when I try to remove warnings/
fix test cases

> > The problem isn't the length as I said, the problem is that there is no convention for the naming
> > of these.
> 
> Maybe I should move the file to "pending"?

This is the most important command in the entire project (you can't _do anything_ on a machine without
a shell), No other pending command I've ever seen is this in-auditable, even ones written by you.
 
> And as I said: if I were to I apply an aesthetic patch which does nothing but
> make the code smell like you, toysh would be all yours and I would never touch
> it again.

Make it look like whatever you want, I honestly couldn't care. I am someone who's
looked over the infrastructure and cares about toybox enough to work on and learn about
it in my free time. And even binary searched sh.c for a segfault once. And I can barely
understand it without taking hours upon hours to extensively look over every part because variable
names don't _tell you anything_. The person who is trying to build a minimal system for
whatever reason is not gonna sit down for 12 hours to debug-printf their way into understanding
it so they can get whatever the bug of the day is fixed.

If you wanna do it all yourself and don't care how obfuscated it is to an outside perspective without
the testing and development knowledge gained along the way and internal thought process at the time.
I honestly won't complain, anything that gets test cases fixed and missing builtins included. I can
treat it like a black box until a changing number of bash features (what _are_ we implementing?)
from a changing specification of a 84,000 line shell is added, then refactored into a state where it's readable.
And I'll keep my hands off your territory until then.

> Clearly I have been too slow to finish this, and it's time for you to take over.
> Your aesthetic sense is empirically superior to mine,

(Because I want variable names that say something, and you want ones that say nothing because
"less bytes" and "sh.c is my territory and I don't expect anyone else to even look at it until it's done
because clearly I don't need any help"?)

> I stand chastized.

I'm sorry I offended you, but this code is easily the hardest code in the project to audit.
I've never had issues figuring out how the other code works, this is the exception, _not_ the norm.
Part of this is behemoth functions that set out to implement entire subsystems, which allows for
spaghetti code, but a lot of this is also variable names saying _nothing_.

You want auditable code, and when someone puts this to the test by trying
to audit your auditable-optimized code to fix the...

$ make test_sh VERBOSE=allnopass | grep -c FAIL
78

78 failed test cases, and get runtest.sh to work on mkroot so the new release doesn't get held back
another 4 months. And goes "Hey, there are 1,000,000 local variables that are just called "ss" or something
that all do different things, maybe we should do _something_ about that"...

> Clearly I have been too slow to finish this, and it's time for you to take over.
> Your aesthetic sense is empirically superior to mine, I stand chastized.

> Would you like me to stay out of your way while you take over toysh and refactor
> it to your heart's content? I note that I will never touch it again if I hand it
> over.

You can make up 10,000 theories as to why a bicycle can never keep balance, and no
matter how logically sound they seem millions of people will still continue riding bicycles.
You can make up 10,000 reasons why double (in the spirit of single) letter variable names
inside 600 line functions that interact with each other in 10,000 different ways and with _no_
comments as to how any of it works whatsoever is actually more readable, and the auditors will love
it. But that will not change the fact I (and presumably many other people) still can't get
halfway down do_source without completely losing track of what the code is doing.

You have been working on this longer then I have been _alive_, which means everything makes
sense to you because you've spent years debugging it. But from someone on the outside this is...

And I KNOW this isn't just me:
$ git log toys/pending/sh.c | grep Author | uniq -c | sort -h
      1 Author: Koudai Iwahori <koudai at google.com>
      1 Author: Oliver Webb <aquahobbyist at proton.me>
      1 Author: Rich Felker <dalias at aerifal.cx>
      2 Author: Alexander Holler <holler at ahsoftware.de>
      2 Author: Elliott Hughes <enh at google.com>
      2 Author: James Farrell <jamesfarrell at google.com>
      6 Author: Rob Landley <rob at landley.net>
      7 Author: Rob Landley <rob at landley.net>
      8 Author: Rob Landley <rob at landley.net>
     11 Author: Rob Landley <rob at landley.net>
     15 Author: Rob Landley <rob at landley.net>
     24 Author: Rob Landley <rob at landley.net>
    104 Author: Rob Landley <rob at landley.net>

The most important command has only been touched by a handful of people.
I don't know what that tells you but I know what that tells me.

But no, I just want the entire codebase to "smell like me".

- Oliver Webb aquahobbyist at proton.me


More information about the Toybox mailing list