[Toybox] faccessat and AT_SYM_NOFOLLOW

Sun Sep 28 14:20:25 PDT 2014

On Sun, Sep 28, 2014 at 12:05:33PM -0500, Rob Landley wrote:
> On 09/25/14 11:01, Rich Felker wrote:
> > This message is a follow-up to a discussion Rob Landley and I had on
> > #musl regarding musl's returning an error of EINVAL when the
> > AT_SYM_NOFOLLOW flag is passed to faccessat (a nonstandard usage),
> > which is affecting toybox.
> 
> Nonstandard, but documented in the Linux man page, and accepted by
> glibc, uclibc, and even klibc.
> 
> The flag is currently ignored by existing libcs because the system call

No, it's not. As I described in the previous email, glibc implements
it in userspace, and does so incorrectly. I have no idea what uclibc
does. klibc probably does ignore it, but klibc is anything but a
guideline for correctness.

> In the meantime, I can either pass in the ignored flag requesting the
> behavior I _want_ (even if I can't currently get it), and then start
> getting the right behavior when libc upgrades, or I can add a "when this
> happens, do this" item to my todo list and never remember to go back and
> do it.

I'm still not sure what it is you want to get from this flag.

> > Issue 1: Is the inclusion of AT_SYM_NOFOLLOW in the man page a total
> > documentation error (not actually supported by glibc at all) or just a
> > failure to mark it as a glibc extension?
> >
> > Here's the relevant glibc file:
> > 
> > https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/faccessat.c;h=4a6048ec7930c8fc249ee629b1d2618cd81084b0;hb=HEAD
> > 
> > As I read it:
> > 
> > 1. It sets EINVAL for flags that are invalid, but considers
> >    AT_SYM_NOFOLLOW valid. So far so good...
> > 
> > 2. It flag is 0 or the only flag set is AT_EACCESS and the binary is
> >    not suid, it just makes the syscall directly. OOPS, big bug -- it
> >    does not honor changes to the effective uid made by programs
> >    initially started as root!
> 
> My lack of caring about AT_EACCESS is deep and profound, and it seems
> like setfsuid() would be the way to handle this if you did care? Rather
> than the weird child spawning thing musl is doing? But that's the "quick
> glance" reply...

access (always) and faccessat (by default) use the _real_ uid/gid to
check whether the file would be accessible as that user. setfsuid
would not do anything, because unlike functions that actually access
the filesystem, access/faccessat answer the question "what would
happen IF I were actually running as the real uid/gid rather than the
current effective/fs uid/gid?" This is (one reason) why they're
utterly the wrong thing to use.

As for why we can't use setfsuid, it's because it's not reversible.
Linux allocates kernelspace priv objects whenever you change any of
the uids, and any such operation can fail. This means after setfsuid,
it can be impossible to get back your original fsuid, from which there
is no way to recover. The only way to avoid this is to change the
fsuid in a newly-forked process. This is exactly what musl does,
except for real uid/gid rather than fsuid/fsgid, because
access/faccessat uses real uid/gid.

> > 3. Otherwise, it uses fstatat, possibly with the AT_SYM_NOFOLLOW flag,
> >    to get the file ownership and mode and performs its own access
> >    permissions check in userspace. This is imprecise and does not
> >    respect ACLs or any other advanced permission models provided by
> >    LSMs etc.
> 
> OS/2 extended attributes (and NT's copy of them) aren't unix
> permissions. (I'd be more concerned about checking for exec when the
> filesystem is mounted noexec or a file where the dynamic linker doesn't
> exist or the file is arm and this is x86... and if you're not going
> there, why do you care what SELinux has to say about it?)

There are various valid uses for access/faccessat (as opposed to the
invalid uses affected by races) where you want to ensure that a
particular file is _not_ accessible by a particular user (e.g.
checking that the authorized_keys file and .ssh dir are not writable
by the user themselves for certain types of restricted accounts). It
would be bad if access/faccessat falsely reported inaccessible (safe)
when the file was actually accessible due to an ACL or other extended
means of granting access.

> > I was not even aware that this imprecise emulation was used in the
> > AT_SYM_NOFOLLOW case; I figured they would do it in an exact way,
> > which is possible with some /proc/self/fd tricks.
> 
> There are times when proc isn't mounted. While the system is booting can
> be one of those times.

Indeed, but Linux has lots of stuff that can't be made to work without
/proc mounted.. :(

> > So my conclusion? There are some moderate-level documentation errors.
> > glibc implements the flag, but not correctly. The changes I would
> > recommend to the documentation:
> 
> So it's still not your bug, it's the rest of the world's bug. I'll
> continue to code to the rest of the world then, and locally patch in a
> workaround which will require an #ifdef __MUSL__

....

> > 1. Document that AT_SYM_NOFOLLOW is not standard for this function,
> >    and is a glibc extension. (uclibc is just a copy of glibc code)
> >
> > 2. Document that AT_SYM_NOFOLLOW and AT_EACCESS are emulated and
> >    unreliable on glibc.
> 
> Define "unreliable".

See above (reporting a file as inaccessible when it's accessible, or
vice versa).

> > 3. Document that the man page is covering the POSIX/glibc function
> >    details, and the kernel syscall does not support flags at all.
> 
> There are posix-2013 man pages. The Linux man pages are not the
> posix-2013 man pages. Presumably people can look at the posix man page
> if they want posix rather than linux semantics.

These are not Linux semantics. They're GNU/Linux semantics. :-P

> > Issue 2: Should musl support or ignore the AT_SYM_NOFOLLOW with
> > faccessat?
> > 
> > If anyone would like to see support, I can consider it for a future
> > agenda item (and of course: patches welcome), but I'm not going to add
> > an implementation that reports incorrect information.
> 
> I already have a local #ifdef __MUSL__ entry in portability.h. I'll
> patch in the local #ifdef in my build of musl can call it good.
> 
> > Just ignoring
> > the flag would give the wrong results (possibly dangerous)
> 
> Yes, how could we possibly behave the same way as glibc and uclibc and
> klibc, we must add an explicit test to be gratuitously inconstent with
> them in the name of the greater good.

It's not the same behavior. glibc actually answers the question you
asked (refusing to follow symlinks), but checks permissions
incorrectly when it does so.

> > and would
> > leave the application no way to know that the flag was ignored
> > (whereas failing with EINVAL makes it clear, and is explicitly
> > documented as an optional error condition for invalid flags).
> 
> Makes it clear. Returning -EINVAL is the same as "ignored" for you?

I don't understand what you're asking.

> > However:
> > 
> > Issue 3: Does the AT_SYM_NOFOLLOW flag even make sense with faccessat?
> > 
> > The whole point of AT_SYM_NOFOLLOW is to avoid TOCTOU races checking
> > for symlinks. However, any use of faccessat is fundamentally a TOCTOU
> > race -- the information it obtained is not necessarily correct by the
> > time it returns. Reducing two TOCTOU issues in succession to one
> > TOCTOU issue does not seem useful for most purposes.
> 
> I'm attempting a read operation to avoid scheduling a write (dentry
> chmod) for a file we're about to delete because ssid has limited write
> cycles and I'd like to avoid gratuitously wearing the flash out faster.
> The read should be more or less a cache local operation (unlink has to
> fault the dentry in anyway, doing it here vs doing it there is a wash).
> I don't care if it's entirely accurate (rm -rf on chmod 000 directories
> is sort of an edge case at the best of times), I just don't want to do a
> chmod before every unlink as a regular thing.

The solution to this is simple: you try to open the directory you want
to traverse into (fopenat+fdopendir), and only fchmodat if the open
operation failed due to permissions. This eliminates a syscall (the
useless faccessat) and does not do any chmod unless it's necessary.

> > Issue 4: Does faccessat, with or without AT_SYM_NOFOLLOW, make sense
> > for implementing rm -rf?
> > 
> > No. At best it's a wasted syscall that slows you down, and at worst it
> > gives you wrong information.
> 
> Because you're making a judgement call about what I'm doing without
> knowing why I'm doing it.
> 
> That is not libc's job.

This is orthogonal to the issue about whether AT_SYM_NOFOLLOW is
supported. (Note: I am willing to support it if you or anyone else
wants to contribute/help with a _WORKING_ implementation but I will
not add a hack to just ignore the application's request not to follow
symlinks -- that's a serious security bug.) However, with that said,
the pattern of:

1. Use access/faccessat at to see if an operation is permitted.
2. Do the operation, if step 1 succeeded.

is ALWAYS a bug, because:

1. Using access/faccessat uses real uid/gid rather than fsuid/fsgid to
   check permissions.
2. The two-step process has an inherent TOCTOU race.

The reason I bring this up is because I'm trying not to just reject a
request for a hack in musl, but also help solve the issue in toybox
that served as the motivation for that request to begin with.

> > The efficient and correct implementation
> > is to simply _try_ the operation which might fail (openat?) and only
> > change file permissions and retry if it failed with EACCESS. This
> > requires zero extra syscalls in the success case (versus one extra
> > with faccessat).
> 
> Cache local syscalls are cheap. Writes to physical disk, not so much.

The correct way to do it does not involve extra writes to the physical
disk.

> > As for AT_SYM_NOFOLLOW, I'm not clear why it was being used. Even if
> > the inaccessible file is a symlink target, you'd need to use
> > AT_SYM_NOFOLLOW or similar when doing the fchmodat,
> 
> Because unlinking a symlink always works even when it's chmod 000, but
> unlinking a directory requires descending into the directory which will
> fail if it's unreadable, so we have to chmod it to make it readable so
> we can delete its contents with rm -f.

There are several ways to do this, the most obvious of which is:

1. Try unlinkat. If it fails with EPERM or EISDIR, you have a
   directory; otherwise, you're done.
2. Try openat with O_DIRECTORY to open the directory to traverse into
   it. If this fails with ENOTDIR, you raced with changes to the fs,
   so go back to step 1. If it failed with EACCESS, use fchmodat to
   set mode u+rwx and retry step 2.
3. Use fdopendir and process the directory you opened.
4. Try unlinkat with AT_REMOVEDIR to remove the directory. If it fails
   with ENOTDIR or EEXIST or ENOTEMPTY, go back to step 1.

> This can NEVER be perfectly reliable. Ext2 attributes "chattr i
> filename" sets the filesystem-level immutable bit which I _could_ add
> special case code to deal with but that's pilot error and I'm not gonna
> and it's not my job.

I think users would be pretty damn angry if you did that. The whole
point of chattr +i is to prevent accidental (or likewise, malicious)
modification/removal attempts.

> > and that one's
> > what actually protects you from races used to trick rm into changing
> > permissions on files it shouldn't. Use of the result from faccessat to
> > make the decision is a TOCTOU race.
> 
> fchmodat() already has its own AT_SYMLINK_NOFOLLOW.  Whether or not it's
> _implemented_ is again, something I can wait for libc to catch up on. :)

If fchmodat's AT_SYMLINK_NOFOLLOW flag does not work, I don't think
there's any way to make rm -rf safe against certain race-based attacks
where a malicious user tricks another user (who is trying to delete
the malicious user's file tree) into performing a chmod operation
which ends up acting on a symlink targetted at one of the victim's own
files.

Unfortunately, glibc is actually broken here; see:

https://sourceware.org/bugzilla/show_bug.cgi?id=14578

On musl it works correctly, but it does depend on /proc being mounted.

Rich