[Toybox] [CLEANUP] du.c
Rob Landley
rob at landley.net
Sun Aug 18 02:21:06 PDT 2013
Long overdue, cleaning up du.c (which predates the pending directory).
http://landley.net/hg/toybox/rev/1010 (and
http://127.0.0.1/hg/toybox/rev/1011).
Added [-HL] and such to the end of the option string so last option of
a given type wins, and fiddled with the help text.
The usual whitespace cleanup: end of line comments (especially that go
way
past 80 chars) moved to the line before. /* multiline comments */ that
are
actually only one line converted to // comments. Space after flow
control
statements that look like functions but aren't, ala if() -> if (). Blank
line between declarations and the rest of the code block. Moving else
after
the curly bracket when there is one. In GLOBALS have the variables the
option parser fills out one per line, with a blank line between them and
other globals (and those others collated by type since order isn't
important
there the way it is with the option variables).
The linux kernel guys have a rule against typedefs, which I find
convincing.
(Also, "char* blah;" is wrong, it's "char *blah;" because "char* one,
two;"
doesn't make "two" a pointer. Putting that space in the wrong place
implies they don't understand how declarations work in C. Think about
"char* one,* two;" vs "int x, *y;"...)
print:
Every user of the print function except one is doing
dirtree_path(node, NULL)
which returns a malloc() that's never freed. Given that we're
iterating
over an arbitrary filesystem (potentially including things like
/proc and
/sys or network mounts not locally stored), this is leaking an
unbounded
amount of memory. So I swapped it to take node as an argument, do
its own
dirtree_path() and free the result, and print "total" if fed NULL.
Um, and in make_human_readable... 1<<64 is 18 exabytes. This has
yotta and
zeta but they can't be represented in a long long? (Each 1024 is 10
bits, we
have 64 bits, it has 9 characters of unit string...) Right, du is the
only user of that function so inline _it_ too... (This turned into a
more
or less rewrite of the print() function.)
Ok, let's look at functions only ever used once (make_pathproper,
llist_add_inode, is_node_present) and see if they can be inlined...
make_pathproper:
This function just appends / to "." and ".." entries. Adding
if (node->parent) before the dirtree_notdotdot() test in do_du()
removes
the need for it.
The other two are only ever used in do_du(), so let's clean that up...
do_du:
Don't recurse if we crossed device boundary on -x, none of the things
_under_ it should be on the old device. So return 0 in that case.
Both llist_add_node and is_node_present are only used from here. I
combined
them into seen_inode() which takes a stat pointer and returns whether
or not it had already seen the inode. (If it hadn't, it adds it to
the
list. If the stat pointer is NULL it flushes its cache so the old
free_inode
can go away too.) The list is using two allocations to store 3
fields: a
next pointer, an inode, and a dev_t. That can combined into one
allocation.
You can't hardlink directories, so I made it check S_ISDIR(). This
isn't
QUITE right because you can bind mount directories, and the
gnu/dammit
version of du skipse bind mounted directories. But A) that adds more
inodes
to remember, since directory link count is usually >1, B) other
users of
this logic (tar, cipo, genext2fs) couldn't restore directory
hardlinks so
detecting them wouldn't help. (It's a judgement call...)
No need to use ->extra to flag the second call from comeagain, that's
already indicated by ->data being -1. Freeing up that field means we
don't need struct node_size to glue another ->extra onto the end of
struct
dirtree, although on 32 bit systems we can't calculate a du total
longer
than 2 gigs so possibly we should be doing that _anyway_, but with a
size_t?
I rewrote the whole S_ISDIR() block, taking into account that ->extra
can now just store the size (in blocks, so we overflow less on 32
bits,
although that's still only a max size of 1 terabyte because 32 bit
signed
long is 2 billion, * 512...) Got rid of TT.dirsum and just had nodes
update
their ->parent->extra to pass data back up the tree.
in du_main:
if toys.optc == 0 it assigns toys.optargs[0] = "./", which can now
be "."
since make_pathproper is gone, but there's a problem: if argc was
zero
get_optflags() in lib/args.c will allocate an array of length 1 for
this.
We then assign to that one and only allocated entry, and then do a
while (*toys.optargs++) and fall off the end into unallocated space
which _may_ contain a null terminator, but only by accident.
So do char **noargs = {".", 0}; and then swap to that for default
args.
No need to set TT.maxdepth to INT_MAX, it defaults to 0, just have
print()
check for 0. Similarly, no need to initialize TT.total and TT.inodes
to 0.
And if TT.depth isn't 0 after a previous call to dirtree_add_node we
screwed
something up. Moved symfollow inline, and broken the "declaration =
value"
to two lines. (Same net number of lines, but easier to follow this
way.)
Still one todo item: feed print() blocks instead of bytes. Also, du
could use a test suite. (I did a lot of testing against the toybox
source itself, but that's not stable enough to use in the test suite.
I'd have to create a directory of files for it to du over, probably
with "truncate".)
Also: the gnu/dammit du uses 1024 byte blocks by default unless you
export POSIXLY_CORRECT. Busybox defconfig also does 1024, you can swich
it off at config time but not runtime. My compromise was to default to
1024 (and accept but ignore -k), but add a -K flag to use 512 byte
blocks so you can get the posix behavior with an alias.
Oh, and our rounding for -h differs in places from the gnu/dammit du.
(We sometimes say 1.9 when they say 2.0, etc). Meanwhile busybox is all
over the place, doesn't even match with -m...
Rob
More information about the Toybox
mailing list