[Toybox] faccessat and AT_SYM_NOFOLLOW
rob at landley.net
Sun Sep 28 10:05:33 PDT 2014
On 09/25/14 11:01, Rich Felker wrote:
> This message is a follow-up to a discussion Rob Landley and I had on
> #musl regarding musl's returning an error of EINVAL when the
> AT_SYM_NOFOLLOW flag is passed to faccessat (a nonstandard usage),
> which is affecting toybox.
Nonstandard, but documented in the Linux man page, and accepted by
glibc, uclibc, and even klibc.
The flag is currently ignored by existing libcs because the system call
does not have a flags parameter. That said, the linux-kernel guys are
going back and _adding_ flags paramers to system calls on a regular basis:
(And have been doing so for a while, which is how we got umount2() and
friends.) So waiting for the system call to show up and various C
libraries to switch to using it doesn't seem like an unreasonable thing
In the meantime, I can either pass in the ignored flag requesting the
behavior I _want_ (even if I can't currently get it), and then start
getting the right behavior when libc upgrades, or I can add a "when this
happens, do this" item to my todo list and never remember to go back and
> Rob's idea for using it came from the Linux
> man pages, which document this flag as supported and do not make it
> clear that it's glibc, not Linux, providing the support.
When I first looked uclibc was completely ignoring it. I haven't looked
at glibc because I try not to get any FSF code on me.
> Issue 1: Is the inclusion of AT_SYM_NOFOLLOW in the man page a total
> documentation error (not actually supported by glibc at all) or just a
> failure to mark it as a glibc extension?
> Here's the relevant glibc file:
> As I read it:
> 1. It sets EINVAL for flags that are invalid, but considers
> AT_SYM_NOFOLLOW valid. So far so good...
> 2. It flag is 0 or the only flag set is AT_EACCESS and the binary is
> not suid, it just makes the syscall directly. OOPS, big bug -- it
> does not honor changes to the effective uid made by programs
> initially started as root!
My lack of caring about AT_EACCESS is deep and profound, and it seems
like setfsuid() would be the way to handle this if you did care? Rather
than the weird child spawning thing musl is doing? But that's the "quick
> 3. Otherwise, it uses fstatat, possibly with the AT_SYM_NOFOLLOW flag,
> to get the file ownership and mode and performs its own access
> permissions check in userspace. This is imprecise and does not
> respect ACLs or any other advanced permission models provided by
> LSMs etc.
OS/2 extended attributes (and NT's copy of them) aren't unix
permissions. (I'd be more concerned about checking for exec when the
filesystem is mounted noexec or a file where the dynamic linker doesn't
exist or the file is arm and this is x86... and if you're not going
there, why do you care what SELinux has to say about it?)
> I was not even aware that this imprecise emulation was used in the
> AT_SYM_NOFOLLOW case; I figured they would do it in an exact way,
> which is possible with some /proc/self/fd tricks.
There are times when proc isn't mounted. While the system is booting can
be one of those times.
> So my conclusion? There are some moderate-level documentation errors.
> glibc implements the flag, but not correctly. The changes I would
> recommend to the documentation:
So it's still not your bug, it's the rest of the world's bug. I'll
continue to code to the rest of the world then, and locally patch in a
workaround which will require an #ifdef __MUSL__
> 1. Document that AT_SYM_NOFOLLOW is not standard for this function,
> and is a glibc extension. (uclibc is just a copy of glibc code)
> 2. Document that AT_SYM_NOFOLLOW and AT_EACCESS are emulated and
> unreliable on glibc.
> 3. Document that the man page is covering the POSIX/glibc function
> details, and the kernel syscall does not support flags at all.
There are posix-2013 man pages. The Linux man pages are not the
posix-2013 man pages. Presumably people can look at the posix man page
if they want posix rather than linux semantics.
> (This might aid in getting the kernel folks to add a new faccessat4
> syscall that would do flags at the kernel level.)
No, it really wouldn't. (Probably not counterproductive, merely
orthogonal. They're already talking about this sort of thing on a
semi-regular basis, but if you don't email linux-kernel they'll never
notice you. That said, michael kerrisk does email proposed man page
updates to linux-kernel from time to time. The man 7 containers page has
done several rounds, for example...)
> Do these sound reasonable?
They sound irrelevant.
> Issue 2: Should musl support or ignore the AT_SYM_NOFOLLOW with
> If anyone would like to see support, I can consider it for a future
> agenda item (and of course: patches welcome), but I'm not going to add
> an implementation that reports incorrect information.
I already have a local #ifdef __MUSL__ entry in portability.h. I'll
patch in the local #ifdef in my build of musl can call it good.
> Just ignoring
> the flag would give the wrong results (possibly dangerous)
Yes, how could we possibly behave the same way as glibc and uclibc and
klibc, we must add an explicit test to be gratuitously inconstent with
them in the name of the greater good.
> and would
> leave the application no way to know that the flag was ignored
> (whereas failing with EINVAL makes it clear, and is explicitly
> documented as an optional error condition for invalid flags).
Makes it clear. Returning -EINVAL is the same as "ignored" for you?
> Issue 3: Does the AT_SYM_NOFOLLOW flag even make sense with faccessat?
> The whole point of AT_SYM_NOFOLLOW is to avoid TOCTOU races checking
> for symlinks. However, any use of faccessat is fundamentally a TOCTOU
> race -- the information it obtained is not necessarily correct by the
> time it returns. Reducing two TOCTOU issues in succession to one
> TOCTOU issue does not seem useful for most purposes.
I'm attempting a read operation to avoid scheduling a write (dentry
chmod) for a file we're about to delete because ssid has limited write
cycles and I'd like to avoid gratuitously wearing the flash out faster.
The read should be more or less a cache local operation (unlink has to
fault the dentry in anyway, doing it here vs doing it there is a wash).
I don't care if it's entirely accurate (rm -rf on chmod 000 directories
is sort of an edge case at the best of times), I just don't want to do a
chmod before every unlink as a regular thing.
> Issue 4: Does faccessat, with or without AT_SYM_NOFOLLOW, make sense
> for implementing rm -rf?
> No. At best it's a wasted syscall that slows you down, and at worst it
> gives you wrong information.
Because you're making a judgement call about what I'm doing without
knowing why I'm doing it.
That is not libc's job.
> The efficient and correct implementation
> is to simply _try_ the operation which might fail (openat?) and only
> change file permissions and retry if it failed with EACCESS. This
> requires zero extra syscalls in the success case (versus one extra
> with faccessat).
Cache local syscalls are cheap. Writes to physical disk, not so much.
(I really hope we go to MRAM or something that doesn't wear out, but
asking vendors to switch from products that wear out to products that
don't wear out is asking capitalism not to caveat all the emptors to the
hilt and then taunt them bond villain style afterwards.)
> As for AT_SYM_NOFOLLOW, I'm not clear why it was being used. Even if
> the inaccessible file is a symlink target, you'd need to use
> AT_SYM_NOFOLLOW or similar when doing the fchmodat,
Because unlinking a symlink always works even when it's chmod 000, but
unlinking a directory requires descending into the directory which will
fail if it's unreadable, so we have to chmod it to make it readable so
we can delete its contents with rm -f.
This can NEVER be perfectly reliable. Ext2 attributes "chattr i
filename" sets the filesystem-level immutable bit which I _could_ add
special case code to deal with but that's pilot error and I'm not gonna
and it's not my job.
But "I extracted a tarball, now rm -rf won't delete it"... that's
something rm should get right in at least the common case.
> and that one's
> what actually protects you from races used to trick rm into changing
> permissions on files it shouldn't. Use of the result from faccessat to
> make the decision is a TOCTOU race.
fchmodat() already has its own AT_SYMLINK_NOFOLLOW. Whether or not it's
_implemented_ is again, something I can wait for libc to catch up on. :)
More information about the Toybox