[Toybox] Toybox Static Analysis
Rob Landley
rob at landley.net
Sat Feb 15 14:52:33 PST 2014
On 02/06/14 04:55, Tom Sparrow wrote:
> Hi Rob,
>
> I am very excited to see your great work. Thanks a lot for creating LGPL
> equivalent for busybox.
>
> I am also interested to contribute in this project. Here are my few
> cents to start.
>
> I have executed static analysis over hg revision no. 1192.
>
> Please find the attached static analysis results with following tools.
>
> - LLVM Scan-build
> - cppcheck
> - RATS
>
> (please check index.html for summary.)
>
> Hope this will be useful. This analysis process can be automated and
> executed over the hg revisions on regular intervals.
>
> Thanks and Best Regards
> Tom
I started with cppcheck more or less at random (alphabetically first in the
directory), and I've attached that tarball so the list at large can see what
I'm replying to. (I'll have to go into the admin thing to pass it through,
but oh well.)
I don't tend to use static analysis tools because of the flood of false
positives, but as long as somebody's sent me a drop...
TOYBOX_CPPCHECK
posix/expand.c:
36 - variable scope.
It's saying I could put "len" inside the for loop (I.E. move it down two
lines), but since I put a blank line after declarations doing so would
add an extra line. Right now there's one block of local variable
declarations, it wants there to be two. The fact one of the variables
is reset each iteration through the loop and the others are retained
isn't hugely relevant.
104 - sscanf
The line is "if (sscanf(s, "%u%n", t, &count) != 1) break;" and the complaint
is "scanf without field width limits can crash with huge input data on some
versions of libc." This is a %u scan which is parsing a null terminated
string to write an unsigned integer, and a %n which records the number of
bytes consumed out of the string into a second integer. The only way that
can go nuts is if I pass it an unterminated input string, or if libc is
broken (not my bug).
posix/date.c:
63 - scanf
Once again, it hates "scanf("%u");" which is not a bug. If your libc
can't deal with "999999999999999999999999999999999999999999..." as an input,
your libc is broken. I'm not going to work around it: fix your libc.
pending/ftpget.c
132 - unused variable
There's a reason it's in pending. This is not that reason. (Doesn't gcc
catch that? Yes, it does, plus 4 more warnings about dereferincing
type-punned pointers...)
pending/fdisk.c
Again, pending. Zillions of hits from uninspected code, many of which are
current gcc warnings.
I'm going to skip "pending" entries from here on. They have not undergone
code inspection or cleanup yet (because the level of it I do requires a lot
of time and focus).
other/pmap.c
37 - scope of count
Yeah, ok. I can move one of the 18 variables declared at the start of the
outer for loop into an inner one. It doesn't actually matter (maximum
stack usage remains the same), but sure.
64 - another scanf
This one is doing a %s without a field limit, but the data comes from
/proc. If proc is feeding you bad data, something is pretty deeply wrong.
78 We hates scanf. Hates it! Foreeeeevoooorrrrr!!!
It's scanning for %lld embedded in a specific string. This is _ok_.
If your scanf can't handle it, your libc is _broken_.
posix/wc.c
50 - scope of "len"
Once again, a variable can move down 4 lines and add a blank line. No.
posix/len.c
443 - scope of "dt"
Yes, I could move dt into the for loop that's not currently declaring
any variables, but I don't care. (Initially the assignment was in
the if statement, it got moved out as it got bigger, but it's not worth
moving the declaration at this point.)
I'm going to stop looking at the 'varible scope' ones because it's not a bug,
it pretty much aesthetic unless we're trying to minimize stack usage (and
nothing so far has been in a recursive call or even close to cache line size).
other/ifconfig.c
63 - unused struct member
Hey, it found its first actual worthwhile improvement! It mis-identified
it, of course. This struct got moved inline during cleanup, and this is
a leftover version. No instance of this struct is ever created, so it's
not just one field unused, it's all of them unused.
So again, not an actual bug, but it did find debris worth cleaning up.
other/acpi.c
35 - scanf! [clutch pearls] scanf! [shriek!]
It's a fscanf %d. It's fine. It's basically atoi() without needing to
explicitly read into a temporary buffer first. (And scanf lets us
check the number of fields converted or return the offset, so we get
better error reporting than atoi, although we don't bother to check
here because it's from sysfs and if the kernel is passing us bad data
the system is already hosed beyond recovery. If I moved it into lib
I'd probably make it an xfunc aborting if it didn't get a value.)
Although looking at the variable scope here, I've decided to do the
_opposite_ of what it's recommending and collate fd, len, and on
up with dfd at the top. (Saves 4 lines.) And then give one of them
back as the blank line before the return...
kconfig/*
Not my code. I'd vaguely like to replace it someday, but it's not a
priority. So the 9 complaints about kernel code I can skip.
scripts/config2help.c
This is a build-time tool that doesn't ship. It's infrastructure code that
generates a header file, so I only really care that it generates the right
output and doesn't segfault, consume insane amounts of memory, or have any
external dependencies like the python it replaced. (Running reasonably
fast is nice but it's C: the I/O dwarfs the processing.)
That said, I'm curious:
347 - duplicateBreak
The line it's complaining about is "throw = catch;" which means your static
analysis tool is broken because C is not C++. I use C++ keywords as C
variables to amuse myself and to annoy people who _do_ think that C is
somehow a subset of C++ and thus all you ever need is C++. Toybox won't
compile as c++.
This code is perfectly fine, the static analysis tool is broken, because
C is not C++.
232, 234 - unreachable code
More confusion based on thinking a C++ keyword applies to C.
195 - unused variable
Yeah, that one's real. I didn't feed -wall to the compiler. Why didn't
I do that... because it complains that some functions in lib index
an array with type "char" (which is a valid thing to do so the compiler
is being weird here complaining about it). And that notes that I never
actually did a "return 0" from int main(), which the static checking
didn't., 234 unreachable code
More confusion based on thinking a C++ keyword applies to C.
79 - scope of "s"
Again, the for loop isn't declaring any variables so making it declare
one all by itself isn't a huge improvement. (I could, I just don't see it
as a big deal. The extra blank line is generally the tiebreaker, and this
would require one. It's only 4 bytes of stack and the compiler must be
smart enough to do lifetime analysis if it's doing _any_ optimization,
because it has to figure out what goes in registers instead of actually
living on the stack.)
This one's _almost_ worth doing, but... not quite.
(Note, if I was willing to rename the project it would be durodango
because that perfectly expresses the philosophy of what I'm going for
here, this project is only worth doing if I polish it until it shines.
So I'm happy to look at all this stuff again, although I have a shortage
of hobby time these days and I'm doing this instead of doing something
else. But I'm not making the actual change here becuase what counts as
"better" isn't always obvious, and I don't agree with the static analysis
tool here.)
195 - unused variable
Yeah, that one's real. I didn't feed -wall to the compiler. Why didn't
I do that... because it complains that some functions in lib index
an array with type "char" (which is a valid thing to do so the compiler
is being weird here complaining about it). And that notes that I never
actually did a "return 0" from int main(), which the static checking
didn't.
(Footnote: the reason I didn't feed -Os or -O2 to the compiler is the
saved runtime would probably be smaller than the extra compile time,
this binary doesn't ship and only gets run once per build.)
posix/cat.c
24 - variable scope "len"
Again, moving one of two variables down 3 lines to live in a for loop that's
not currently declaring any variables (so old declaration statement doesn't
go away), declaring variables in places requires a new blank line. Not
worth doing.
posix/paste.c
Two variable scope warnings, for "int j" and FILE *f. Both were intentional:
I declared j next to where I declared i, and I declared both FILE variables
at the start of the relevant block. Making the if and its else mirror
each other seemed easier to read than being slightly more precise about
variable scope.
lib/lib.c
84 - using void * in calculation
Ah, good catch. That's a gcc-ism. (I think llvm supports it, but it's
untidy. Will typecast.)
690, 691 - variable scope
Nope. Small function, all declarations at the start, I'm good with this.
190, 358 - variable scope
Ditto
110 - unreadVariable
or = readall(fd, libbuf, try);<--- Variable 'or' assigned value never used.
if (or < 0) perror_exit("lskip to %lld", (long long)offset);
else offset -= try;
if (or < try) break;
It's used on the NEXT LINE. And two lines later. I'm guessing it's confused
because "or" is yet another C++ keyword that's not a keyword in C, although
this is a strange expression of that confusion.
515, 517 - scanf still exists
It's scanf "%u". There is no reason this shouldn't work.
lsb/passwd.c
101, 102 - variable scope of "newp" and "salt".
All the variables of main() are declared at the start. Moving two of them
down isn't a huge improvement. (Possibly it would make sense to break up
the one big function into a couple smaller functions, but that's not the
objection here.)
kconfig/mconf.c
What order is it scanning these in? kconfig is still kernel code, not
even under the same license as the rest but it's ok because it doesn't
ship. (We use a single output file it produces, and can edit that file
by hand if we like.)
lib/bunzip.c
Bunch of variable scope stuff. I wrote this code back in the busybox
days with the code style that all variables are declared at the start
of the function. It's doing so consistently, and I'm happy to leave it.
(Note: I've since then moved this code into bzcat.c.)
posix/tail.c
135 - variable scope of "linepop" can be reduced.
Um, this one's subtle: it can be moved into the if(), but it can't
be moved into the _for_ because I want to force the value the first
time through the loop and then calculate it the later times. Initializing
the variable in the for loop would force it to 1 each time. (All the uses
are in the for loop, but the state is retained between loop iterations,
so the initialization has to be at least one level up.)
I actually moved it well away from the for loop precisely so the
separation was more explicit. It's part of the initial state of the
function, so having it be at the start of the function works for me.
(I could also have the declaration and initialization on separate lines
to be even more explicit, but that eats a line plus maybe padding.)
other/count.c
21 - variable scope
The entire function is a dozen lines, with a single variable declaration
block at the start of the function, then a for loop. Moving some of the
variable declarations into the block would require adding an extra blank
line, to make a distinction I honestly don't care about in this instance.
scripts/install.c
Another comple-time binary that doesn't ship, so I only really care if it
works, but let's see...
28 - buffer access out of bounds
How. How? Here's the code:
struct toy_list toy_list[] = {
#include "generated/newtoys.h"
};
#define TOY_LIST_LEN (sizeof(toy_list)/sizeof(struct toy_list))
int main(int argc, char *argv[])
{
static char *toy_paths[]={"usr/","bin/","sbin/",0};
int i, len = 0;
// Output list of applets.
for (i=1; i<TOY_LIST_LEN; i++) {
int fl = toy_list[i].flags;<--- Buffer accessed out of bounds: toy_list
The #include macros into an array initializer for the static array, one
entry for each of the currently enabled commands. Since this is an array of
a defined size, TOY_LIST_LEN is calculating the size of the whole list
(it's an array, not a pointer), divided by the size of an element of the
list, yielding hte number of entries in the list. We then loop through
the entries in said list, skipping the first entry (which is the "toybox"
multiplexer; that's the binary we're creating symlinks _to_ and we don't
want to "ln -sf toybox toybox" during the install).
So how is this an out of bounds access?
35 - len assigned value that's never used.
It's right, that isn't used. Looks like I was tracking output so I could
catch output discarded due to disk full or something, but the actual user is
a for loop in a shell script and if you do "for i in $(./blah)" and blah
is experiencing short writes, Bad Things are going to happen anyway...
(I note that "xprintf()" will error_exit() on a short write, but the only
user of this code isn't checking the error return code because $() in the
shell doesn't really give you that option, hence the "return 0". The
importance of this particular bit of error checking can be ascertained from
the fact that printf() buffers its output in FILE * but libc doesn't bother
to retransmit short writes automatically. In toybox, I'm paranoid. In the
build infrastructure, not so much.)
I don't care enough about this one to tweak it myself.
other/lsusb.c
Five instances of "scanf is against my religion". Let's see... It's mad
that I scanf "%u" and then feed it an int. The data comes from the kernel
and according to the field widths I'm padding the output to, busnum and
devnum are actually unsigned char, and pid and vid are unsigned short.
Using int is just easier on a 32 bit processor (prevents arm having to mask
and shift to handle smaller data sizes).
There is the possibility that carefully crafted inputs bypassing the kernel
could make the tool print negative numbers. If I changed it to "%d" then
it's possible that carefully crafted inputs bypassing the kernel could make
the tool print negative numbers, but the static analyzer wouldn't complain.
I'm ok with this.
other/vmstat.c
46 - MelonMelonMelonOutOfCheeseError
Ok, what I'm doing here assigning stuff to itself in its own declaration
is disabling a spurious warning from gcc about "value may be used
uninitialized". The gcc variable lifetime analysis can't always track when
multiple variables are used together, so X only gets set when Y = 7 and
we tested for that last loop before incrementing Z, so we can only ever
get into this state with known values.
Unfortunately, there's no way to switch off gcc's broken "variable may be
used uninitialized" warning (which produces tons of false positives) without
disabling "variable is used uninitialized" which doesn't.
Here's my blog entry ranting about it back in 2006:
http://landley.net/notes-2006.html#31-12-2006
So we force it to shut up with an impossible assignment that produces no
code. The "rudundant assignemnt of X to itself", in the declaration of X,
is a trick I picked up from the linux kernel. It marks the variable as
used in gcc's lifetime analysis, but produces no output in the binary.
It's an out the gcc developers gave people to force it to shut up,
which is close as the gcc development team could come to admitting it
made a mistake spamming us with a warning it can't comptently produce.
main.c
144 - buffer access out of bounds
This is the same hiccup that install.c had, in analogous code. If you
can explain to me how an out of bounds access actually happens here,
I'd appreciate it. I'm not seeing one.
27 - variable scope "middle"
I declare top, bottom, and middle together because they're conceptually
grouped. It's a toss up putting it with "result" instead, but I'm pretty
happy having it where it is.
Not quite to the end of this yet, but that was pretty much my free time
for 2 days, so I'm going to stop here and post what I've got.
Rob
-------------- next part --------------
A non-text attachment was scrubbed...
Name: TOYBOX_CPPCHECK_Results.tar.bz2
Type: application/x-bzip
Size: 403219 bytes
Desc: not available
URL: <http://lists.landley.net/pipermail/toybox-landley.net/attachments/20140215/75c629ef/attachment-0005.bin>
More information about the Toybox
mailing list