[Toybox] [PATCH] Fix mv on overwrite.

Rob Landley rob at landley.net
Mon Aug 31 19:41:51 PDT 2015



On 08/31/2015 02:16 PM, enh wrote:
> On Sun, Aug 30, 2015 at 4:39 AM, Rob Landley <rob at landley.net> wrote:
>> On 08/28/2015 08:02 PM, enh wrote:
>>> two patches here. the first patch is clearly a desirable bug fix but
>>
>> Er, yes. Sorry about that.
>>
>>> there are some trickier compatibility choices lurking in the same
>>> area. there's a bigger patch below, but i've provided both in case the
>>> latter requires too much thought/experimentation :-)
>>
>> I read through so many weird rm corner cases making that work. And yes,
>> posix and gnu disagreed on some of them:
>>
>> http://landley.net/notes-2012.html#06-12-2012
>>
>> Oh, and the fix to the infinite directory traversal thing is to check
>> the stat info in the parent dirtree node when you traverse .. to go back
>> up and if it doesn't match traverse back down from the top as far as you
>> can confirm the matches for.
>>
>> What you do about there being a significant gap between the two,
>> however, I'm honestly not sure about. (I need test cases. If somebody
>> just did a mv of your parent directory three levels up entirely out of
>> the tree you're considering, the behavior should still be correct,
>> essentially deferring the mv until after the traversal completes. If
>> somebody does a mv elsewhere within the same tree... that's pilot error,
>> and we avoid a rm or cp wandering outside the tree.)
>>
>> I lean towards "if you have a significant gap, abort the darn directory
>> tree traversal because somebody did something stupid". Except rsync
>> should ignore it, because that's a best-effort thing on potentially
>> active filesystems. Maybe I need a DIRTREE_PERSEVERE flag? (And possibly
>> DIRTREE_INFINITE flag to not cache the filehandles, although that could
>> also be a dynamic "once you hit ~50 levels deep, switch modes" thing...
>> except we don't currently have global dirtree traversal state...)
>>
>> My todo list has archaeological _layers_.

Ok, not _quite_ that simple because symlinks.

When you follow a symlink (not the common case but we have a
DIRTREE_SYMFOLLOW knob), we need to cache the filehandle of the
symlink-used-as-directory and we can't discard that without traversing
all the way back down from the top because ".." doesn't return to where
we came from. Not the common case, but when I sat down to see if I could
easily implement this (and thus make rm -rf match this corner case of
the spec, and yes I'm waiting until I have the qemu based test
environment to _TEST_ that) I caught the hiccup.

The right thing to do would be to add a global maxfiles to struct
toy_context, have the call to dirtree_start() initialize it via
getrlimit(RLIMIT_NOFILE), and then have the DIRTREE_RECURSE path check
the returned filehandle and if it's greater than toys.maxfiles/2 close
parent->parent filehandles as necessary (not the one dirtree_parentfd()
would return but the one _above_ that) and then have the return path (or
more likely dirtree_parentfd()) reopen them via if (parent->data == -1)
openat(parent->data, ".."); and compare stat info and do the
retraverse-from-top dance as necessary. Module the cached symlink
handles thing to jump back there. (Actually I think their parent's fd is
what I need to save, the DIRTREE_SYMFOLLOW logic is doing
fstatat(AT_SYMLINK_FOLLOW) so their stat logic is directory match not
symlink so we can .. _to_ them, just not up past them. If we set closed
dir->data filehandles to -1 then the check is only on the descent side,
not on the return side. And since there's _already_ a symlink traversal
limit in the kernel for any given path (90 entries, I think?), running
out of filehandles due to symlink traversal with DIRTREE_SYMFOLLOW
enabled and erroring out because of it is find with me.)

Really the _hard_ part is updating the relevant section of code.html to
clearly explain how the new stuff would _work_. But I'd also want to
audit the users to make sure no code is using ->parent->data directly if
I've changed its semantics, although I think I already did that when
putting dirtree_parentfd() in?

Right now we use one filehandle per depth, not per node in the tree (we
open the directory filehandle when we recurse and close it again when we
return from it), and given the default 1024 rlimit this isn't a _common_
problem ("find . | sed 's/[^/]*//g' | sort -u" says the maximum depth in
the kernel source is 10 slashes, for example). But for the 1.0 release I
wanna do it right.

Meanwhile, sticking it back on the todo heap and poking at the
moundshalf-finished stuff instead of opening this new can of worms right
now. I need to _close_ tabs...

Rob

P.S. Yes I normally blog this nonsense instead of writing it here but
I'm a month and change behind on my blog updates. I have them
half-written up but not formatted properly and the links I want to
reference are mostly [TODO] notes, and so on.


More information about the Toybox mailing list