[Toybox] [CLEANUP] du.c

Rob Landley rob at landley.net
Sun Aug 18 02:21:06 PDT 2013


Long overdue, cleaning up du.c (which predates the pending directory).

http://landley.net/hg/toybox/rev/1010 (and  
http://127.0.0.1/hg/toybox/rev/1011).

Added [-HL] and such to the end of the option string so last option of  
a given type wins, and fiddled with the help text.

The usual whitespace cleanup: end of line comments (especially that go  
way
past 80 chars) moved to the line before. /* multiline comments */ that  
are
actually only one line converted to // comments. Space after flow  
control
statements that look like functions but aren't, ala if() -> if (). Blank
line between declarations and the rest of the code block. Moving else  
after
the curly bracket when there is one. In GLOBALS have the variables the
option parser fills out one per line, with a blank line between them and
other globals (and those others collated by type since order isn't  
important
there the way it is with the option variables).

The linux kernel guys have a rule against typedefs, which I find  
convincing.

(Also, "char* blah;" is wrong, it's "char *blah;" because "char* one,  
two;"
doesn't make "two" a pointer. Putting that space in the wrong place
implies they don't understand how declarations work in C. Think about
"char* one,* two;" vs "int x, *y;"...)

print:

   Every user of the print function except one is doing  
dirtree_path(node, NULL)
   which returns a malloc() that's never freed. Given that we're  
iterating
   over an arbitrary filesystem (potentially including things like  
/proc and
   /sys or network mounts not locally stored), this is leaking an  
unbounded
   amount of memory. So I swapped it to take node as an argument, do  
its own
   dirtree_path() and free the result, and print "total" if fed NULL.

   Um, and in make_human_readable... 1<<64 is 18 exabytes. This has  
yotta and
   zeta but they can't be represented in a long long? (Each 1024 is 10  
bits, we
   have 64 bits, it has 9 characters of unit string...) Right, du is the
   only user of that function so inline _it_ too... (This turned into a  
more
   or less rewrite of the print() function.)

Ok, let's look at functions only ever used once (make_pathproper,
llist_add_inode, is_node_present) and see if they can be inlined...

make_pathproper:

   This function just appends / to "." and ".." entries. Adding
   if (node->parent) before the dirtree_notdotdot() test in do_du()  
removes
   the need for it.

The other two are only ever used in do_du(), so let's clean that up...

do_du:

   Don't recurse if we crossed device boundary on -x, none of the things
   _under_ it should be on the old device. So return 0 in that case.

   Both llist_add_node and is_node_present are only used from here. I  
combined
   them into seen_inode() which takes a stat pointer and returns whether
   or not it had already seen the inode. (If it hadn't, it adds it to  
the
   list. If the stat pointer is NULL it flushes its cache so the old  
free_inode
   can go away too.) The list is using two allocations to store 3  
fields: a
   next pointer, an inode, and a dev_t. That can combined into one  
allocation.

   You can't hardlink directories, so I made it check S_ISDIR(). This  
isn't
   QUITE right because you can bind mount directories, and the  
gnu/dammit
   version of du skipse bind mounted directories. But A) that adds more  
inodes
   to remember, since directory link count is usually >1, B) other  
users of
   this logic (tar, cipo, genext2fs) couldn't restore directory  
hardlinks so
   detecting them wouldn't help. (It's a judgement call...)

   No need to use ->extra to flag the second call from comeagain, that's
   already indicated by ->data being -1. Freeing up that field means we
   don't need struct node_size to glue another ->extra onto the end of  
struct
   dirtree, although on 32 bit systems we can't calculate a du total  
longer
   than 2 gigs so possibly we should be doing that _anyway_, but with a  
size_t?

   I rewrote the whole S_ISDIR() block, taking into account that ->extra
   can now just store the size (in blocks, so we overflow less on 32  
bits,
   although that's still only a max size of 1 terabyte because 32 bit  
signed
   long is 2 billion, * 512...) Got rid of TT.dirsum and just had nodes  
update
   their ->parent->extra to pass data back up the tree.

in du_main:

   if toys.optc == 0 it assigns toys.optargs[0] = "./", which can now  
be "."
   since make_pathproper is gone, but there's a problem: if argc was  
zero
   get_optflags() in lib/args.c will allocate an array of length 1 for  
this.
   We then assign to that one and only allocated entry, and then do a
   while (*toys.optargs++) and fall off the end into unallocated space
   which _may_ contain a null terminator, but only by accident.

   So do char **noargs = {".", 0}; and then swap to that for default  
args.

   No need to set TT.maxdepth to INT_MAX, it defaults to 0, just have  
print()
   check for 0. Similarly, no need to initialize TT.total and TT.inodes  
to 0.
   And if TT.depth isn't 0 after a previous call to dirtree_add_node we  
screwed
   something up. Moved symfollow inline, and broken the "declaration =  
value"
   to two lines. (Same net number of lines, but easier to follow this  
way.)

Still one todo item: feed print() blocks instead of bytes. Also, du  
could use a test suite. (I did a lot of testing against the toybox  
source itself, but that's not stable enough to use in the test suite.  
I'd have to create a directory of files for it to du over, probably  
with "truncate".)

Also: the gnu/dammit du uses 1024 byte blocks by default unless you  
export POSIXLY_CORRECT. Busybox defconfig also does 1024, you can swich  
it off at config time but not runtime. My compromise was to default to  
1024 (and accept but ignore -k), but add a -K flag to use 512 byte  
blocks so you can get the posix behavior with an alias.

Oh, and our rounding for -h differs in places from the gnu/dammit du.  
(We sometimes say 1.9 when they say 2.0, etc). Meanwhile busybox is all  
over the place, doesn't even match with -m...

Rob


More information about the Toybox mailing list