[Toybox] Toybox Static Analysis

Rob Landley rob at landley.net
Sat Feb 15 14:52:33 PST 2014


On 02/06/14 04:55, Tom Sparrow wrote:
> Hi Rob,
> 
> I am very excited to see your great work. Thanks a lot for creating LGPL
> equivalent for busybox.
> 
> I am also interested to contribute in this project. Here are my few
> cents to start.
> 
> I have executed static analysis over hg revision no. 1192.
> 
> Please find the attached static analysis results with following tools.
> 
> - LLVM Scan-build
> - cppcheck
> - RATS
> 
> (please check index.html for summary.)
> 
> Hope this will be useful. This analysis process can be automated and
> executed over the hg revisions on regular intervals.
> 
> Thanks and Best Regards
> Tom

I started with cppcheck more or less at random (alphabetically first in the
directory), and I've attached that tarball so the list at large can see what
I'm replying to. (I'll have to go into the admin thing to pass it through,
but oh well.)

I don't tend to use static analysis tools because of the flood of false
positives, but as long as somebody's sent me a drop...

TOYBOX_CPPCHECK
posix/expand.c:
  36 - variable scope.

  It's saying I could put "len" inside the for loop (I.E. move it down two
  lines), but since I put a blank line after declarations doing so would
  add an extra line. Right now there's one block of local variable
  declarations, it wants there to be two. The fact one of the variables
  is reset each iteration through the loop and the others are retained
  isn't hugely relevant.

  104 - sscanf

  The line is "if (sscanf(s, "%u%n", t, &count) != 1) break;" and the complaint
  is "scanf without field width limits can crash with huge input data on some
  versions of libc." This is a %u scan which is parsing a null terminated
  string to write an unsigned integer, and a %n which records the number of
  bytes consumed out of the string into a second integer. The only way that
  can go nuts is if I pass it an unterminated input string, or if libc is
  broken (not my bug).

posix/date.c:

  63 - scanf

  Once again, it hates "scanf("%u");" which is not a bug. If your libc
  can't deal with "999999999999999999999999999999999999999999..." as an input,
  your libc is broken. I'm not going to work around it: fix your libc.

pending/ftpget.c

  132 - unused variable

  There's a reason it's in pending. This is not that reason. (Doesn't gcc
  catch that? Yes, it does, plus 4 more warnings about dereferincing
  type-punned pointers...)

pending/fdisk.c

  Again, pending. Zillions of hits from uninspected code, many of which are
  current gcc warnings.

I'm going to skip "pending" entries from here on. They have not undergone
code inspection or cleanup yet (because the level of it I do requires a lot
of time and focus).

other/pmap.c

  37 - scope of count

  Yeah, ok. I can move one of the 18 variables declared at the start of the
  outer for loop into an inner one. It doesn't actually matter (maximum
  stack usage remains the same), but sure.

  64 - another scanf

  This one is doing a %s without a field limit, but the data comes from
  /proc. If proc is feeding you bad data, something is pretty deeply wrong.

  78 We hates scanf. Hates it! Foreeeeevoooorrrrr!!!

  It's scanning for %lld embedded in a specific string. This is _ok_.
  If your scanf can't handle it, your libc is _broken_.

posix/wc.c

  50 - scope of "len"

  Once again, a variable can move down 4 lines and add a blank line. No.

posix/len.c

  443 - scope of "dt"

  Yes, I could move dt into the for loop that's not currently declaring
  any variables, but I don't care. (Initially the assignment was in
  the if statement, it got moved out as it got bigger, but it's not worth
  moving the declaration at this point.)

I'm going to stop looking at the 'varible scope' ones because it's not a bug,
it pretty much aesthetic unless we're trying to minimize stack usage (and
nothing so far has been in a recursive call or even close to cache line size).

other/ifconfig.c

  63 - unused struct member

  Hey, it found its first actual worthwhile improvement! It mis-identified
  it, of course. This struct got moved inline during cleanup, and this is
  a leftover version. No instance of this struct is ever created, so it's
  not just one field unused, it's all of them unused.

  So again, not an actual bug, but it did find debris worth cleaning up.

other/acpi.c

  35 - scanf! [clutch pearls] scanf! [shriek!]

  It's a fscanf %d. It's fine. It's basically atoi() without needing to
  explicitly read into a temporary buffer first. (And scanf lets us
  check the number of fields converted or return the offset, so we get
  better error reporting than atoi, although we don't bother to check
  here because it's from sysfs and if the kernel is passing us bad data
  the system is already hosed beyond recovery. If I moved it into lib
  I'd probably make it an xfunc aborting if it didn't get a value.)

  Although looking at the variable scope here, I've decided to do the
  _opposite_ of what it's recommending and collate fd, len, and on
  up with dfd at the top. (Saves 4 lines.) And then give one of them
  back as the blank line before the return...

kconfig/*

  Not my code. I'd vaguely like to replace it someday, but it's not a
  priority. So the 9 complaints about kernel code I can skip.

scripts/config2help.c

  This is a build-time tool that doesn't ship. It's infrastructure code that
  generates a header file, so I only really care that it generates the right
  output and doesn't segfault, consume insane amounts of memory, or have any
  external dependencies like the python it replaced. (Running reasonably
  fast is nice but it's C: the I/O dwarfs the processing.)

  That said, I'm curious:

  347 - duplicateBreak

  The line it's complaining about is "throw = catch;" which means your static
  analysis tool is broken because C is not C++. I use C++ keywords as C
  variables to amuse myself and to annoy people who _do_ think that C is
  somehow a subset of C++ and thus all you ever need is C++. Toybox won't
  compile as c++.

  This code is perfectly fine, the static analysis tool is broken, because
  C is not C++.

  232, 234 - unreachable code

  More confusion based on thinking a C++ keyword applies to C.

  195 - unused variable

  Yeah, that one's real. I didn't feed -wall to the compiler. Why didn't
  I do that... because it complains that some functions in lib index
  an array with type "char" (which is a valid thing to do so the compiler
  is being weird here complaining about it). And that notes that I never
  actually did a "return 0" from int main(), which the static checking
  didn't., 234 unreachable code

  More confusion based on thinking a C++ keyword applies to C.

  79 - scope of "s"

  Again, the for loop isn't declaring any variables so making it declare
  one all by itself isn't a huge improvement. (I could, I just don't see it
  as a big deal. The extra blank line is generally the tiebreaker, and this
  would require one. It's only 4 bytes of stack and the compiler must be
  smart enough to do lifetime analysis if it's doing _any_ optimization,
  because it has to figure out what goes in registers instead of actually
  living on the stack.)

  This one's _almost_ worth doing, but... not quite.

  (Note, if I was willing to rename the project it would be durodango
  because that perfectly expresses the philosophy of what I'm going for
  here, this project is only worth doing if I polish it until it shines.
  So I'm happy to look at all this stuff again, although I have a shortage
  of hobby time these days and I'm doing this instead of doing something
  else. But I'm not making the actual change here becuase what counts as
  "better" isn't always obvious, and I don't agree with the static analysis
  tool here.)

  195 - unused variable

  Yeah, that one's real. I didn't feed -wall to the compiler. Why didn't
  I do that... because it complains that some functions in lib index
  an array with type "char" (which is a valid thing to do so the compiler
  is being weird here complaining about it). And that notes that I never
  actually did a "return 0" from int main(), which the static checking
  didn't.

  (Footnote: the reason I didn't feed -Os or -O2 to the compiler is the
  saved runtime would probably be smaller than the extra compile time,
  this binary doesn't ship and only gets run once per build.)

posix/cat.c

  24 - variable scope "len"

  Again, moving one of two variables down 3 lines to live in a for loop that's
  not currently declaring any variables (so old declaration statement doesn't
  go away), declaring variables in places requires a new blank line. Not
  worth doing.

posix/paste.c

  Two variable scope warnings, for "int j" and FILE *f. Both were intentional:
  I declared j next to where I declared i, and I declared both FILE variables
  at the start of the relevant block. Making the if and its else mirror
  each other seemed easier to read than being slightly more precise about
  variable scope.

lib/lib.c

  84 - using void * in calculation

  Ah, good catch. That's a gcc-ism. (I think llvm supports it, but it's
  untidy. Will typecast.)

  690, 691 - variable scope

  Nope. Small function, all declarations at the start, I'm good with this.

  190, 358 - variable scope

  Ditto

  110 - unreadVariable

    or = readall(fd, libbuf, try);<--- Variable 'or' assigned value never used.
    if (or < 0) perror_exit("lskip to %lld", (long long)offset);
    else offset -= try;
    if (or < try) break;

  It's used on the NEXT LINE. And two lines later. I'm guessing it's confused
  because "or" is yet another C++ keyword that's not a keyword in C, although
  this is a strange expression of that confusion.

  515, 517 - scanf still exists

  It's scanf "%u". There is no reason this shouldn't work.

lsb/passwd.c

  101, 102 - variable scope of "newp" and "salt".

  All the variables of main() are declared at the start. Moving two of them
  down isn't a huge improvement. (Possibly it would make sense to break up
  the one big function into a couple smaller functions, but that's not the
  objection here.)

kconfig/mconf.c

  What order is it scanning these in? kconfig is still kernel code, not
  even under the same license as the rest but it's ok because it doesn't
  ship. (We use a single output file it produces, and can edit that file
  by hand if we like.)

lib/bunzip.c

  Bunch of variable scope stuff. I wrote this code back in the busybox
  days with the code style that all variables are declared at the start
  of the function. It's doing so consistently, and I'm happy to leave it.

  (Note: I've since then moved this code into bzcat.c.)

posix/tail.c

  135 - variable scope of "linepop" can be reduced.

  Um, this one's subtle: it can be moved into the if(), but it can't
  be moved into the _for_ because I want to force the value the first
  time through the loop and then calculate it the later times. Initializing
  the variable in the for loop would force it to 1 each time. (All the uses
  are in the for loop, but the state is retained between loop iterations,
  so the initialization has to be at least one level up.)

  I actually moved it well away from the for loop precisely so the
  separation was more explicit. It's part of the initial state of the
  function, so having it be at the start of the function works for me.
  (I could also have the declaration and initialization on separate lines
  to be even more explicit, but that eats a line plus maybe padding.)

other/count.c

  21 - variable scope

  The entire function is a dozen lines, with a single variable declaration
  block at the start of the function, then a for loop. Moving some of the
  variable declarations into the block would require adding an extra blank
  line, to make a distinction I honestly don't care about in this instance.

scripts/install.c

  Another comple-time binary that doesn't ship, so I only really care if it
  works, but let's see...

  28 - buffer access out of bounds

  How. How? Here's the code:

  struct toy_list toy_list[] = {
  #include "generated/newtoys.h"
  };

  #define TOY_LIST_LEN (sizeof(toy_list)/sizeof(struct toy_list))

  int main(int argc, char *argv[])
  {
    static char *toy_paths[]={"usr/","bin/","sbin/",0};
    int i, len = 0;

    // Output list of applets.
    for (i=1; i<TOY_LIST_LEN; i++) {
      int fl = toy_list[i].flags;<--- Buffer accessed out of bounds: toy_list

  The #include macros into an array initializer for the static array, one
  entry for each of the currently enabled commands. Since this is an array of
  a defined size, TOY_LIST_LEN is calculating the size of the whole list
  (it's an array, not a pointer), divided by the size of an element of the
  list, yielding hte number of entries in the list. We then loop through
  the entries in said list, skipping the first entry (which is the "toybox"
  multiplexer; that's the binary we're creating symlinks _to_ and we don't
  want to "ln -sf toybox toybox" during the install).

  So how is this an out of bounds access?

  35 - len assigned value that's never used.

  It's right, that isn't used. Looks like I was tracking output so I could
  catch output discarded due to disk full or something, but the actual user is
  a for loop in a shell script and if you do "for i in $(./blah)" and blah
  is experiencing short writes, Bad Things are going to happen anyway...

  (I note that "xprintf()" will error_exit() on a short write, but the only
  user of this code isn't checking the error return code because $() in the
  shell doesn't really give you that option, hence the "return 0". The
  importance of this particular bit of error checking can be ascertained from
  the fact that printf() buffers its output in FILE * but libc doesn't bother
  to retransmit short writes automatically. In toybox, I'm paranoid. In the
  build infrastructure, not so much.)

  I don't care enough about this one to tweak it myself.

other/lsusb.c

  Five instances of "scanf is against my religion". Let's see... It's mad
  that I scanf "%u" and then feed it an int. The data comes from the kernel
  and according to the field widths I'm padding the output to, busnum and
  devnum are actually unsigned char, and pid and vid are unsigned short.
  Using int is just easier on a 32 bit processor (prevents arm having to mask
  and shift to handle smaller data sizes).

  There is the possibility that carefully crafted inputs bypassing the kernel
  could make the tool print negative numbers. If I changed it to "%d" then
  it's possible that carefully crafted inputs bypassing the kernel could make
  the tool print negative numbers, but the static analyzer wouldn't complain.

  I'm ok with this.

other/vmstat.c

  46 - MelonMelonMelonOutOfCheeseError

  Ok, what I'm doing here assigning stuff to itself in its own declaration
  is disabling a spurious warning from gcc about "value may be used
  uninitialized". The gcc variable lifetime analysis can't always track when
  multiple variables are used together, so X only gets set when Y = 7 and
  we tested for that last loop before incrementing Z, so we can only ever
  get into this state with known values.

  Unfortunately, there's no way to switch off gcc's broken "variable may be
  used uninitialized" warning (which produces tons of false positives) without
  disabling "variable is used uninitialized" which doesn't.


  Here's my blog entry ranting about it back in 2006:

    http://landley.net/notes-2006.html#31-12-2006

  So we force it to shut up with an impossible assignment that produces no
  code. The "rudundant assignemnt of X to itself", in the declaration of X,
  is a trick I picked up from the linux kernel. It marks the variable as
  used in gcc's lifetime analysis, but produces no output in the binary.
  It's an out the gcc developers gave people to force it to shut up,
  which is close as the gcc development team could come to admitting it
  made a mistake spamming us with a warning it can't comptently produce.

main.c

  144 - buffer access out of bounds

  This is the same hiccup that install.c had, in analogous code. If you
  can explain to me how an out of bounds access actually happens here,
  I'd appreciate it. I'm not seeing one.

  27 - variable scope "middle"

  I declare top, bottom, and middle together because they're conceptually
  grouped. It's a toss up putting it with "result" instead, but I'm pretty
  happy having it where it is.

Not quite to the end of this yet, but that was pretty much my free time
for 2 days, so I'm going to stop here and post what I've got.

Rob
-------------- next part --------------
A non-text attachment was scrubbed...
Name: TOYBOX_CPPCHECK_Results.tar.bz2
Type: application/x-bzip
Size: 403219 bytes
Desc: not available
URL: <http://lists.landley.net/pipermail/toybox-landley.net/attachments/20140215/75c629ef/attachment-0002.bin>


More information about the Toybox mailing list