[Toybox] gcc warnings; order of evaluation

Sat Oct 8 23:43:35 PDT 2022

On 10/8/22 22:08, Ray Gardner wrote:
>  Rob, let me know if it's inappropriate to discuss your programming-related blog
> posts in the toybox list.

Eh, it's not so much "inappropriate" as I don't... have the energy? See the point?

I was expressing annoyance, not a call to action. I don't have influence over
the standards committee or the compiler developers. Long ago I used to
maintaining a tinycc fork and was planning out qcc, but I haven't had "skin in
the game" in that area for years.

I USED to be up to date on all this stuff. Way back in the dark ages I read
Herbert Schildt's annotated C90 spec cover to cover and had it more or less
memorized. (And yes, I'm aware the cost difference between the official ISO copy
of the spec and Schildt's annotated version was said to reflect the value of
Schildt's annotations.) Back in college I was trying to track down the "simple C
compiler" from Dr. Dobbs Journal as a basis to write my own (for that platform
independent binary executable code project thing; it meant I'd have two kinds of
function pointers but DOS had "near" and "far" pointers already so... eh. Gave
me a weird perspective when I was introduced to java a few years later.)

But that was a very long time ago, and my imperfect memory of that is something
like 3 specs behind the times (2 of which I already care about), and predates
the move to 64 bit (and thus the LP64 standard... heck, there were still plenty
of 16 bit systems back then) and the the rewrite of C compilers in C++ allowing
the C++ developers to actively try to expand "undefined" behavior in C.

I'm unlikely to engage with gcc: any expectations I had for sane behavior from
them went out the window somewhere between
https://lwn.net/Articles/259157/#:~:text=Stallman and
https://lwn.net/Articles/390016/ . The PCC project seems to have rolled to a
stop again because Apple poured money into LLVM followed by Google, and it
became the Designated Alternative; I note I pushed Qualcomm in that direction
when I did a 6 month Hexagon bringup contract for them a decade ago, and 6
months later my old boss gave this talk:
https://www.youtube.com/watch?v=nfyuFPc5Iow . I've never gotten any traction
with the LLVM developers, no email I sent them has ever been replied to. (But
then I've only ever sent like 3? And am not subscribed to their list. I'm still
subscribed to the pcc and tinycc lists, but not that one.)

I only finally acknowledged that C99 wasn't enough because the
(typecast){constant} thingy was useful, and because we needed to work around
that noreturn bug. I mostly treat C changing the same way I treat driving rules
changing. I don't want to have to _study_ for my license renewal, I'm just
trying to get from point A to point B. The actual driving part is a cost center.

Undefined behavior is a cop-out, and there's always a reason for it if you dig
deep enough. The mainframes and minicomputers it was glossing over differences
in are all long-gone, these days it's because optimizer writers introduce bugs
and call it a feature. You can argue minutiae but I reject the category
conceptually: "then define it already". Be consistent and have regression tests.

These days mostly I just test what the compilers I use accept (currently that
means gcc and llvm across 16 architectures and 3 C libraries), and react when
something breaks. Which is how scripting languages work anyway. If this doesn't
provide enough coverage, it means I don't have enough testing. I'm slowly
incorporating ASAN into my workflow since at least so far it's NOT a significant
false positive generator, which is refreshing.

I'm neither enough of a language expert to dictate this stuff at anyone
(experienced sure, up to date not really, authoritative definitely not), nor do
I have the energy to traverse the political quagmire.

If I cleared my plate enough to make a serious go at qcc, then I'd have start to
caring again. But if nobody else seems to understand _why_ I considered that
important, let alone grabbed a shovel, presumably it won't be missed.

> I asked the folks at StackOverflow about this, and I think the consensus
> (and my own understanding) is that the warning is bogus,

Bogus warnings are normal. GCC _still_ has the "may be used uninitialized but
provably deterministically isn't" warning, and LLVM still needs
-Wno-string-plus-int which gcc doesn't recognize. (Which is why I moved it to
scripts/portability.sh .) Yes "string"+4 is a valid thing to do in C, yes LLVM
warns about it anyway, because it thinks its users don't know what they're
doing. That is why that warning exists.

Similarly, if (a=b) isn't wrong, but somebody decided "add extra parentheses to
show you MEANT to use an assignment instead of a comparison", and me going
"runtime testing finds the bugs, doo dah, doo dah..." doesn't change what the
compiler authors decided to do.

(Seriously, bash doesn't warn about "if ((x=4)); then echo hello; fi" being an
assignment instead of a comparsion, because it doesn't NEED to. Note that the
double parentheses there aren't "warning suppression", it's how the shell does
math. It's basically $(()) except the result becomes the return code instead of
resolving to a decimal string.)

(A --lint mode that produced the "this could be a false positive" warnings
wouldn't be so bad... but then you get people forcing that on in their builds
and requiring it by policy as part of "fortify" and so on...)

> But I think some of your other assertions are wrong.

Entirely possible. I'm neither positioning myself as an expert nor trying to
keep all that current.

> That's also undefined in C, as far back as 1974 (Ritchie).

And yet it doesn't produce a _warning_ for that. Presumably because it would
produce all the false positives in the world, unless they special cased printf
or something.

And it works. It's possible there's a compiler out there it doesn't work on, but
I haven't encountered one yet, going back through slowaris and aix to at least
OS/2. The bigger issue in _this_ area is usually that printing to stderr and
stdout gets reordered relative to each other without extra fflush() calls. Which
again: no warning about that, you find it in testing.

I don't want the compiler to babysit me. I need to test stuff. Upgrading
compilers introduces regressions all the time, just like upgrading libc,
upgrading kernels... Debian's pretty good about apt-get update not breaking much
but Red Hat was a _minefield_ and I lost a gentoo system that stopped being able
to build ANY packages after an upgrade (thus being unable to install anything
new after that due to its build-from-source design).

And distro major version upgrades derail build environments all the time, that's
why AOSP specified specific Ubuntu and Debian versions to build on all those years.

By the way, the glibc people who are theoretically most scrupulous about this crap?

  https://landley.net/notes-2022.html#28-08-2022

They lock their projects to only build with specific versions of THEIR OWN
COMPILER. Which they produce. It is very difficult to take ANY of this seriously
when the people who push it out onto the world are doing that at home.

> I've read somewhere (can't find it now) that some standard committee members
> aren't happy with the way the anti-aliasing rules (C standard 6.5 par. 7) are
> stated.

The language's creator objected to stuff the standards committee kept adding:

  https://www.lysator.liu.se/c/dmr-on-noalias.html

Alas, he died in 2011. And mostly stayed out of committee politics long before then.

> But the order of evaluation issues were pretty well settled in C89 with the
> "sequence point" concept that mostly just clarified what Ritchie had stated
> for 15 years already.

"Isn't currently standardized" doesn't mean "will never be". ANSI C was very
clearly that char was NOT guaranteed to be an 8 bit byte. (Technically unix
started on a pdp-17 with an 18 bit word size that was 3 6-bit bytes.) Eventually
the systems that applied to all died. LP64 not being part of the C spec today is
because Microsoft lobbied against it, everything else I'm aware of is LP64*. We
no longer have 6 bit bytes, binary coded decimal, drum memory...

C is a tool at the level of a hammer and chisel. Defined behavior is good.
Adding bells and whistles, not so much.

But again: opinionated != gearing up to leverage political change in this area.
I'm mostly happy when they don't make it _worse_. As David Graeber said in his
famous 2013 essay, I have other fish to fry...

Rob

* Despite the name, it essentially incorporates LP32 by reference: Long and
pointer are the same size so pointer fits in a long, the other 4 base integer
types have explicitly defined sizes. That's it. Supporting 16 bit processors
under that regime doesn't come up much anymore: atmel avr was 8 bit and so's
6502, beyond that I've mostly seen a jump up to 32 bit this century, ala avr32
or cortex-m. But technically a 16 bit processor could have 32 bit "int" the same
way a 32 bit processor can have 64 bit "long long", it's an array of the type
you can handle and libcc.a calls a function to do math on it. Whether such a
theoretical 16 bit system would have long be 2 bytes... again, really hasn't
come up that I've noticed. The 8086 was a horrible hybrid with 20 bit pointers
needing 2 registers to access memory, that's where near/far came from and it's
part of that legacy we've left behind with 6 bit bytes...