[Toybox] readdir() and deleted files

Rob Landley rob at landley.net
Fri Nov 8 14:44:19 PST 2024


On 11/8/24 07:45, enh wrote:
> certainly my knee-jerk reaction to the bug was "duh, don't modify the
> directory you're traversing!" but allegedly they can't reproduce with
> glibc/gnu find, which seemed weird.

Alas, I don't get to say that, and out in the wild modifying traversed 
directories happens ALL THE TIME.

In _really_ bad cases it's a security exploit, because the naive way to 
handle infinite depth rm -rf (required by posix, and by reality with 
"while mkdir a; do mv b a; mv a b; done" doing an infinite depth inode 
forkbomb without ever TRAVERSING more than one level) is to cd subdir 
and then cd .. on the way out counting your depth, and if root's doing 
that to a world writeable directory which somebody else does a "mv" of a 
directory partway down but above where it's currently travrsing, and 
they cd .. out into somewhere unexpected and keep going...)

I haven't implemented it yet (dirtree is still keeping a filehandle open 
at each level so you're generally limited to 1000 directories deep 
before openat() fails), but my PLAN for handling infinite recursion 
depth is to openat("..") and compare the dev/ino saved in the parent 
dirtree node's stat field, and if they don't match either abort or drill 
down from the top again as far as we can back towards where we were, and 
continue from there.

Both are nontrivial, in part because I haven't got a way of signalling 
preference (WHEN should it abort vs continue?), and resync could screw 
up callbacks that don't get COMEAGAIN and similar they were expecting, 
and because if it's something like "find" traversing then I want to 
avoid duplicates when I re-traverse the parent directory, so I have to 
scan and record all entries in each parent directory before descending 
into any of its subdirectories (that way I avoid doing readdir() again 
at all, just stat and confirm it's the same dev/ino), which can get 
memory expensive...

And then, of course, I've long wanted to do "works like an archiver" 
versions of mke2fs/mkfatfs/mkisofs/mksquashfs where I can go something 
like "mkfatfs subdir | gzip" which means I can neither mmap nor seek on 
the output but have to produce it all in sequence, which requires 
reading ahead all the data before writing any of it. (Which dirtree 
should now have all the plumbing for with the breadth first stuff.) That 
sort of thing is inherently two passes: collecting metadata in pass 1, 
and reading/writing data in pass 2, and there's no guarantee nothing's 
changed between them so... cope? Pad with zeroes I guess? (And you have 
to scan file contents for sparseness in the "metadata" pass because that 
can affect later size offsets...)

> i'm happy to keep calling this user error, though, and your example
> certainly convinces me not to touch libc!

That's why I was checking BOTH dev/ino for in-band failure detection on 
the stat structure. Which I _think_ is reliable. (And if there's a 
system out there where it's not true: A) at least it's way more of a 
lottery ticket, B) pilot error on their part, it works for Linux.)

Rob


More information about the Toybox mailing list