[Toybox] Poking at dirtree.

Rob Landley rob at landley.net
Tue Mar 27 13:59:21 PDT 2012


On 03/27/2012 08:37 AM, Georgi Chorbadzhiyski wrote:
> Around 03/26/2012 06:02 AM, Rob Landley scribbled:
>> I spent a longish time this weekend redoing the lib/dirtree.c stuff, and
>> now I need to test it a _lot) more and redo cp.c to use it properly.  (I
>> don't want to check it in while it still breaks stuff, but if you're
>> curious I attached the "hey, it compiled!" version.)
>>
>> The point of this exercise is to unify the fts, scandir(), and readdir()
>> stuff into a single set of code everything can use.
>>
>> I was tempted to just use the fts stuff since it's closest to what I
>> want and already there in the library, but if you google for it they
>> have problems scaling to large directories.  One problem is they put an
>> absolute path from root in every node, which can add up quickly if you
>> have large directories with deep paths.  (Instead I made a function that
>> creates the directory path when you ask it to, and then you can feed
>> that to realpath() yourself if you need it cannonicalized.)
>>
>> I'm going the other way and using the openat() stuff, which is in posix
>> 2008 now.  This elimiates the old PATH_MAX dependency, and weans dirtree
>> off of toybuf, which was a layering violation anyway: commands should
>> always have toybuf free for their own use, meaning the library should
>> NOT use it beyond a command's back.
>>
>> I know the fts stuff is calling directory callbacks when both entering
>> and leaving a directory, which might be a good idea, but I purposely
>> _didn't_ implement that yet because it's infrastructure in search of a
>> user. When I have an actual use case for that, it's easy enough to add
>> in later.
>>
>> Sorry I've been distracted recently: I've been doing package upgrades in
>> aboriginal linux to get a release out there.  (I'm mostly caught on on
>> the new kernel, but still need to upgrade uClibc and fix the regressions
>> that's bound to cause...)
> 
> What is mostly needed is that the dirtree stuff executes callbacks from the
> bottom of the tree up. For example if we have "a/b/c/{d,e}/file{1,2,3}.txt" and dirtree_recurse
> is called on "a" for cp/rm/rmdir/chmod/chown/etc it'll need to get

Except that things like "cpio" need to create the directory before they
create the file in the directory, so those have to be top down.

And "cp -ap" needs both, because it has to create the directory on the
way down (so it can put contents in it), but that directory has to be
chmod 700 so it can write into it, and then on the way _back_ it has to
chmod it to the real permissions (which may not allow it to write).

The other fun little corner case is whether "." and ".." should show up
in the list.  ("ls -a" cares, and you can't just synthesize entries
because they need the right date so you have to call stat(), which
dirtree is already doing so I'd rather not duplicate it.)

Sigh. It looks like dirtree needs flags.  I'm trying to figure out if
passing them down recursively (eating stack space) is worse that adding
an entry to the global toys structure.  Both are sort of ugly...

> The other thing with fts is the ability to request FTS_LOGICAL or FTS_PHYSICAL
> (possible FTS_XDEV will be neede for find -xdev). The difference is the LOGICAL
> follows symlinks and PHISYCAL do not. These options allow us to implement -H/-L
> options of chown - http://pubs.opengroup.org/onlinepubs/009695399/utilities/chown.html

I already dug into that in some context, trying to remember where.  (I
thought it was for cp, but I guess not...)  If you follow symlinks while
recursing you can get into endless loops really easily, but I guess
that's what cp -L is _supposed_ to do, so...

In theory, the callback can do that.  I'm giving the callback the
ability to say "don't descend into this directory" with its return code.
 In practice, the default is to not descend into symlinked directories,
and you can't veto a negative...

I suspect dirtree needs a "follow symlinks" flag, but probably doesn't
need the other two since one's the default and the other can be
trivially vetoed via callback.  (I vaguely recall that I've got code in
df that detects device transitions already...)

Rob
-- 
GNU/Linux isn't: Linux=GPLv2, GNU=GPLv3+, they can't share code.
Either it's "mere aggregation", or a license violation.  Pick one.

 1332881961.0


More information about the Toybox mailing list