[Toybox] dirtree help

Rob Landley rob at landley.net
Tue Apr 24 14:03:12 PDT 2012


On 04/24/2012 05:21 AM, Georgi Chorbadzhiyski wrote:
> I'm porting chXXX toys to the new dirtree code but I'm running into a problem.

The dirtree stuff is in flux right now. That's why I haven't documented
it yet, because it may still change.

Originally I wrote "what cp needed" with an eye towards things like tar
and mke2fs.  Then I found out that "what ls needed" was totally
different. :)

I've now written what I _think_ ls, cp, and rm can all use.  But until
I've ported cp back to use it, there's probably stuff that needs to be
debugged and tweaked. As it is, I had to split them up so ls could call
the components with very granular control over what happens when.  To do
ls right needs separate "populate", "sort", and "display" passes, but
_still_ needs the callback to determine which files to include. (I could
have filtered that out at a later stage, but it would use memory to
store them until then. The high water mark goes up unnecessarily.)

> Apply the attached patch and see the difference in outputs from coreutils
> chgrp and toybox chgrp toy.
> 
> gf at gf:~/git/toybox$ chgrp -R root test

Reasonably straightfoward depth-first traversal, triggering on last
sight of each element.

> gf at gf:~/git/toybox$ ./toybox chgrp -R root test

Traversal with no filtering, hitting every time the callback can
potentially be called, both on the way down and on the way up.

> I see DIRTREE_COMEAGAIN and DIRTREE_NOSAVE flags but I have no idea how to use
> them to achieve the effect that coreutils directory recursion is having.
> 
> Help, please.

I haven't finished this code yet, which is why I haven't documented it
yet, but looking at this chunk of lib/dirtree.c function
handle_callback(), right now it says:

  flags = callback(new);
  if (S_ISDIR(new->st.st_mode)) {
    if (!(flags & DIRTREE_NORECURSE)) {
      new->data = openat (new->parent ? new->parent->data : AT_FDCWD,
          new->name, 0);
      dirtree_recurse(new, callback);
    }
    new->data = -1;
    if (flags & DIRTREE_COMEAGAIN) flags = callback(new);
  }

I.E. it does the callback on every node, and then figures out if it
needs to recurse (only on directory nodes).  That first callback can
tell it not to look at any children of this directory, by returning the
DIRTREE_NORECURSE bit set in its flags.

On the way out of the directory, if the initial callback returned
DIRTREE_COMEAGAIN it will set node->data to a nonzero value (the node
structure is zeroed when allocated, so on the first callback before we
did any recursing data was 0) and call the callback again.

So in the callback you can do this as the first line:

  if (S_ISDIR(node->st.st_mode) && node->data) return DIRTREE_COMEAGAIN;

And then the body of the callback only gets reached for directory nodes
_after_ the children have all been handled. However, if you need to do
something funky like cp's "chmod(700)" on the way down to ensure that
the newly created directories are writeable, you can do that there, and
then set it to the final values (may be chmod(000)) in the comeagain
callback.

Both NORECURSE and COMEAGAIN are only evaluated for directory nodes,
returning them from files and such is a NOP. But you can check S_ISDIR()
to detect that in the first callback.

That's the idea, anyway.  there are legitimate cases where the callback
needs to happen _both_ before and after traversal.

Also, both "mv" and "cp" want to traverse _two_ directory hierarchies in
parallel. This is why I added the "extra" field, so the openat()
filehandle of the second directory has a place to live during the
traversal, and thus we can descend and ascend without all that tedious
mucking about <strike>in hyperspace</strike> with absolute paths.

The reason for DIRTREE_NOSAVE is that sometimes you want to populate a
tree of data structures (for mke2fs and zip and so on), and sometimes
you want to keep memory usage down (so rm -rf doesn't eat insane amounts
of memory when deleting big directories).

Rob
-- 
GNU/Linux isn't: Linux=GPLv2, GNU=GPLv3+, they can't share code.
Either it's "mere aggregation", or a license violation.  Pick one.

 1335301392.0


More information about the Toybox mailing list