[Toybox] [CLEANUP] yank bool from xzcat

Rob Landley rob at landley.net
Thu Apr 11 10:32:39 PDT 2013


Smallish one this time, http://landley.net/hg/toybox/rev/851

In theory toys.h does all the #includes we need. In practice this isn't  
always the case, but when you see stdlib, stdint, string.h... they can  
probably be yanked.

The one that broke the build was stdbool.h, and that's because I  
intentionally didn't include that in toys.h because I'm not a fan. This  
was added in c99 because C++ guys were used to it, and it raises some  
problems.

According to C, zero is false and nonzero is true. The logical  
comparison operators all return "0" or "1", with type int. (NOT type  
bool.) Wrapping this truth with "true" and "false" in a funky typedef  
doesn't actually improve matters; it's not what the language is  
actually doing. (It's like pascal programmers who did "#define BEGIN {"  
and "#define END }" so they felt at home rather than adapting to the  
new environment.)

In _theory_ what bool allows the compiler to do is use an optimally  
sized type to store a single significant bit of data. In practice,  
"int" turned out to be that optimally sized type on every interesting  
processor, because smaller types have to be expanded to the processor's  
register size.

64-bit processors are usually good about handling 32-bit data because  
they evolved from 32 bit processors. (The DEC Alpha didn't, and has  
issues here. But x86-64 did and arm8 did... So you can generally get  
away with using "int" for bool instead of "long" and the processor has  
good support for dealing with that register size and alignment.) The  
x86 line is also good about 16 bit types because the 8086 had 16-bit  
registers and the hardware has legacy support modes to run the old  
code, but not _as_ good.

(Fun Story, the "Pentium Pro" didn't include legacy optimizations for  
16 bit data access, because they believed Microsoft's claims that  
Windows 95 would be a 32 bit OS. But microsoft kept the scheduler as 16  
bit 8086 mode code because more of it fit in L1/L2 cache that way,  
which ran great on 486 and pentium chips and caused pipeline stalls on  
the pentium pro as it triggered some legacy microcode something to  
emulate the 8086 instructions. This caused the first major falling out  
between Intel and Microsoft.)

But ARM is an optimized low-transistor-count architecture that can't do  
unaligned access and can't natively handle types below 32 bits. So if  
you use a "char" as a loop index on arm the compiler will crap out code  
to mask and shift behind the scenes, and the result's big and slow. (On  
modern processors you only generally care about optimizing tight loops  
where something's repeated a lot, but that includes the infrastructure  
_of_ the loop.)

Keeping a true/false value in an int is a waste of memory, but it's  
fast and produces small code. Saving 3 bytes of storage in exchange for  
16 bytes of binary size is not a win. If you really want to make a  
bitfield out of it you can do so, but keep in mind that _no_ processor  
has decent support for bitfields. They're all going to mask and shift  
behind the scenes. And the compiler's built-in code to do this doesn't  
get much exercise so is generally worse than what you'd write to do it  
yourself. (And let's not get into endianness of bitfields...)

Using stdbool.h means "I don't want to know what the compiler is  
doing", and "I want to let it do wildly different things on different  
platforms for truly minor gains". When you press them, the proponents  
of it generally break down into "but it's conceptually cleaner not  
having to care how this is implemented" (Python exists, go use it), and  
if you press them for an actual example they go "it prevents you from  
storing a 3 in your variable so 'var == TRUE' returns false". Yeah,  
that's a bad test, don't do that. As with const: if you understand what  
you're doing it doesn't help. If you DON'T understand what you're  
doing, you need to learn.)

Anyway, long-winded way of saying I cleaned the "bool", "true" and  
"false" out of xzcat. Doing this tends to open up opportunities for  
further optimization later on, since we now have more visibility into  
what the code is actually doing.

Rob


More information about the Toybox mailing list