[Toybox] Toybox Digest, Vol 7, Issue 5

Rob Landley rob at landley.net
Thu May 10 09:32:52 PDT 2012


On 05/08/2012 11:37 PM, Ashwini Sharma wrote:
> I am working on mv command... so was interested in dirtree_read.

In theory it should be a good tool for that, it was one of the use cases
I had in mind for the design.

> it doesn't give me the expected result, like a hit on every entry in
> source dir.

The callback should get called for every entry as it recurses, or you
can feed it NULL for the callback and let it assemble a directory tree
of nodes and parse that tree.

For mv, you probably want to do the first to keep memory usage down.
For archivers (tar, zip, gene2fs, mkisofs) you probably want the second.

The callback returns a bitfield telling the dirtree logic what to do
next, currently these flags are defined:

// Do not add this node to the tree
#define DIRTREE_NOSAVE       1
// Do not recurse into children
#define DIRTREE_NORECURSE    2
// Call again after handling all children
// (Directories only. Sets linklen = -1 on second call.)
#define DIRTREE_COMEAGAIN    4
// Follow symlinks to directories
#define DIRTREE_SYMFOLLOW    8
// Abort recursive dirtree.
// (Forces NOSAVE and NORECURSE on this entry, otherwise it'd leak.)
#define DIRTREE_ABORT      (256|DIRTREE_NOSAVE|DIRTREE_NORECURSE)

I should probably reverse the DIRTREE_NOSAVE logic. Before the rewrite,
the default behavior when the callback was NULL was to allocate a tree
of nodes. During the rewrite I added a default callback
(dirtree_isdotdot() in lib/dirtree.c) which filters out the "." and ".."
entries from said directory tree.

I _also_ added a granular set of flags the callback could or together to
control the behavior of the directory traversal, but I left the default
behavior alone and made the flags change it.  Since the default behavior
was "save this node unless told not to", that means that _most_
callbacks (which handle each entry as they're informed of it) don't need
the node structure to stick around after they return, so all the ones
I've written so far return DIRTREE_NOSAVE.

What I should do is swap that flag to DIRTREE_SAVE and have the default
callback return that.

(Note that DIRTREE_NOSAVE and DIRTREE_NORECURSE are orthogonal: it'll
keep the node around until after it's looked at all the children.  If
you return nosave on all the nodes but let it recurse, at any given time
you just have a linked list of directory nodes back up to the root,
which are freed as you leave each directory.)

Oh, and your own callback should probably do some variant of:

  int rc = dirtree_isdotdot(node);
  if (rc) return rc;

At the start, unless it's actually interested in the "." and ".."
entries in each directory.

> The source path i get is complete as from start, and not the intended
> directory for move.
> 
> like "mv /temp3/temp1 ./" I get source as temp3/temp1 instead of
> temp1... which should be the intention in case of dirtree_read().
> 
> --ashwini

This weekend I'd like to bring cp up to date with the new dirtree stuff,
so hopefully that'll be a good model for you once I've got that ready.

Here's the dirtree structure it stores for each node:

struct dirtree {
    struct dirtree *next, *parent, *child;
    long extra; // place for user to store their stuff (can be pointer)
    long data;  // dirfd for directory, linklen for symlink
    struct stat st;
    char *symlink;
    char name[];
};

Each node->name is the name of the current file within its directory,
except that the name of the root node of the tree is whatever path you
gave it to the first directory to scan.

Note a C trick I'm using here: if your struct has a single string in it,
if you put an array of unknown size at the end of a struct C99 treats it
as an array of size zero, so you can malloc(sizeof(struct) +
strlen(string) + 1 for the null terminator) and have a variable length
string as part of the structure's single block of memory, and it goes
away when you free the struct without having to separately
free(node->name).  Yes, C99 explicitly allows this, but only at the
_end_ of a structure, and sizeof() treats that last entry as size 0.

So each node has a name and a ->parent to the node above it, and to
assemble a path from the root call dirtree_path(node, 0) which returns a
malloc()ed chunk of memory memory (you need to free() it when you're
done with it) containing all the node names strung together with /
separators.  So if you start with:

  dirtree "./this/../that"

then dirtree_path on a node under there could return something like:

  "./this/../that/one/two/three"

Each directory also has a "data" entry which (for directories) contains
a filehandle open to that directory.  This lets you use the new openat()
and friends that perform operations relative to a directory filehandle
rather than relative to cwd.  (This means that you don't have to worry
about PATH_MAX anymore.)

I added another field in the struct, "extra", which the callbacks can
set to anything they want. One potential use is that mv can keep a
filehandle to the _target_ directory there the new ones it's created),
so you can openat(node_data) to get the old files and
openat(node->extra) to get the new files.  Or you can:

  renameat(node->data, node->name, node->extra, node->name);

And fall back to copying and unlinkat() if that fails. Note that if you
need a filehandle that points to the current directory (to use these new
functions like the old ones), it's AT_FDCWD.

Again, I need to redo the current cp implemenation to use this instead
of gratuitously doing a lot of absolute path stuff (which is slow and
racy and memory intensive and generally obsolete).

The ls implementation actually starts the tree by hand, calling
dirtree_add_node() to get the first one and then calling
dirtree_recurse() on it.  The mv code could use the same trick to
populate the "extra" field with the initial target directory, and then
the callback can use

  node->extra = mkdirat(parent->extra, node->name, mode);

at appropriate places.  And mv wants the COMEAGAIN behavor so the
initial mode can be 0700 and then it can fchmod() it to the final
permissions once it's done.  (That way it can create files in read-only
directories, _and_ use permissions to filter out the more obvious race
conditions with other users trying to screw you up and get you to stomp
files with symlink timing attacks or something.)

As I said, I haven't documented the dirtree infrastructure yet because
it's not _quite_ cooked yet. But I suppose things like switching
DIRTREE_NOSAVE to DIRTREE_SAVE aren't major enough that I can't just
amend the documentation after writing it.  Maybe this weekend I'll have
time...

Rob
-- 
GNU/Linux isn't: Linux=GPLv2, GNU=GPLv3+, they can't share code.
Either it's "mere aggregation", or a license violation.  Pick one.

 1336667572.0


More information about the Toybox mailing list