[Toybox] [PATCH] Support diff --no-dereference

enh enh at google.com
Tue Sep 3 05:02:09 PDT 2024


ping again on this patch... it's wanted for kernel builds.

On Mon, Aug 26, 2024 at 5:30 PM enh <enh at google.com> wrote:
>
> ...but ignoring the bikeshed of "should we do something about all the
> _other_ readlink callers" --- ping on this specific patch :-)
>
> On Fri, Aug 23, 2024 at 3:15 PM Rob Landley <rob at landley.net> wrote:
> >
> >
> >
> > On 8/23/24 06:33, enh wrote:
> > > On Fri, Aug 23, 2024 at 3:32 AM Rob Landley <rob at landley.net> wrote:
> > >>
> > >> On 8/22/24 08:23, enh via Toybox wrote:
> > >> > +  int s = sizeof(toybuf)/2;
> > >> > +
> > >> > +  TT.is_symlink = 1;
> > >> > +  for (i = 0; i < 2; i++) {
> > >> > +    TT.link[i].name = toybuf + i * s;
> > >> > +    TT.link[i].len = readlink0(files[i], TT.link[i].name, s);
> > >> >
> > >> > should probably use xreadlink() and free() instead? toybuf is big
> > >> > enough to be safe for _one_ path on linux, but not two.
> > >>
> > >> Eh, no arbitrary buffer is "enough". PATH_MAX is slowly getting phased out, and
> > >> even posix-2008 requires arbitrary depth for "rm" because:
> > >>
> > >> $ for ((i=0;i<10000;i++)); do mv a b; mkdir a; mv b a/a; done
> > >
> > > i thought in practice though linux couldn't cope with longer paths?
> >
> > It has to, because you can retroactively create them with "mv" and "mount", and
> > if it hasn't loaded in the directory contents to populate the dentry cache of
> > all descendents before doing the mv it doesn't know how deep the rabbit hole goes.
> >
> > Now _symlinks_ are janky. The kernel has a silly "this path contained X many
> > symlinks going down it, and even though this is a finite non-recursive process
> > we error out at an arbitrary limit anyway", but that's not related to directory
> > depth.
> >
> > There's a vfs limit of 255 per directory name, but I don't think there's a limit
> > on how many directories deep you can go, barring inode exhaustion, or maybe the
> > dentry cache running out of memory might behave badly...
> >
> > > if not, every call to readlink0()/readlinkat0() looks problematic.
> >
> > They're not perfect, no. It's a bit like the ps.c code that truncates how long a
> > command line it'll show: nobody's complained. :)
> >
> > I'm usually more concerned with properly supporting spaces and newlines in path
> > names than length limits, because pathological filesystem depth really doesn't
> > come up much. If somebody figures out how to make it an attack vector, there's a
> > couple other places that bump up the todo list then. (I don't think I've
> > actually IMPLEMENTED the dirtree traversal that avoids filehandle exhaustion
> > yet. Talked about it here several times, but haven't DONE it yet...)
> >
> > > plus there's one lone call to readlink() in main.c that's using libbuf
> > > rather than toybuf (although "you put your toybox binary symlinks more
> > > than 4096 bytes deep" sounds like "please stop shooting at your feet"
> > > to me!).
> >
> > libbuf and toybuf are the same size. The rule is toybuf is available for
> > commands and libbuf is for library code, main.c is library code. (Library code
> > modifying toybuf behind a command's back is verbotten.)
> >
> > >> Anyway, a 256 byte buffer would handle 99% of the cases, and this is likely to
> > >> handle all the ones we'd ever actually hit. If we want a proper fix what we need
> > >> is to teach readlink0() to dynamically allocate a buffer, and THAT has the
> > >> problem that the kernel API returns "error" for buffer not big enough without
> > >> telling us how big a buffer we NEED, so you have to iteratively guess...
> > >
> > > that's what xreadlink() already does :-)
> >
> > Which is _conceptually_ funky because xmalloc() calls xexit() when the
> > allocation fails, but readlink() can fail due to file not found and friends, and
> > the LOGICAL thing to do is have xreadlink() return NULL when it couldn't
> > readlink but fail in the "should never happen and we can't recover if it does"
> > malloc() failure case, and a single "x" prefix doesn't distinguish between
> > different failure cases...
> >
> > But complicating the naming convention doesn't make stuff immediately obvious to
> > the reader because they won't remember what the complicated convention means,
> > so... lump it? I forget what I did, which is the problem in a nutshell.
> >
> > Rob


More information about the Toybox mailing list