[Toybox] [PATCH] cpio: Implement -u (copy unconditionally)

Sun Feb 28 02:55:17 PST 2021

On 2/27/21 10:18 AM, Yi-yo Chiang wrote:
> Using different command to create the archive yields different result, I used $
> find dir | cpio -o ... to create the archive for testing:
> 
> (testing the behavior of GNU cpio)
> $ mkdir a
> $ echo blah > a/bb
> $ find a | cpio -H newc -o >a.cpio
> $ # Archive a.cpio creates a (dir) and a/bb (file)

It triggers on an exact match but not a mkpath. So the "create leading
directory" plumbing doesn't do it, only if the directory is in the archive
before the file contained in it.

Were the gnu devs lazy and never bothered to properly implement -d combined with
-u, or is this an intentional "design" difference?

> $ # Test dir from archive replaces existing file
> $ rm -rf a
> $ touch a
> $ cpio -i <a.cpio
> cpio: a not created: newer or same age version exists

That's not something tar or zip do. Why does cpio do it? Does anybody depend on
cpio doing it?

My original idea of ignoring -u was to just do the -u behavior all the time,
because that's what tar does. You're instead making it so not supplying -u never
replaces files, and... I'm not sure why?

According to
https://www.kernel.org/doc/Documentation/filesystems/ramfs-rootfs-initramfs.txt
(which I wrote so long ago the behavior was actually different at the time and I
had to update it, but presumably if it was no longer accurate somebody would
have committed a patch?) says that the kernel initramfs plumbing will always
overwrite. (The original behavior is that it would append the new file's
contents to the old file, which they decided they didn't like. :)

> cpio: a/bb: Cannot open: Not a directory
> 1 block
> $ # Failed to create a (dir)
> $ touch -d yesterday a
> $ cpio -i <a.cpio
> 1 block
> $ # Interesting, if the timestamp of a (file) is older than the archive's a
> (dir), then replacement happens
> $ touch a
> $ cpio -iu <a.cpio
> 1 block

Would always replacing and not doing this weird timestamp comparison break
anything you know of?

> $ # -u unlinks a (file) then mkdir(a)
> 
> $ # Test file from archive replaces existing empty dir
> $ rm a/bb
> $ mkdir a/bb
> $ cpio -i <a.cpio
> cpio: a/bb not created: newer or same age version exists
> 1 block
> $ touch -d yesterday a/bb
> $ cpio -i <a.cpio
> 1 block
> $ rm a/bb
> $ mkdir a/bb
> $ cpio -iu <a.cpio
> 1 block
> $ # Overall similar result when replacing existing file.

What portion of this is new? What are you trying to prove?

> $ # Test when a/bb is not empty
> $ rm -rf a
> $ mkdir -p a/bb/ccc
> $ cpio -i <a.cpio
> cpio: a/bb not created: newer or same age version exists
> 1 block
> $ touch -d yesterday a/bb
> $ cpio -i <a.cpio
> cpio: cannot remove current a/bb: Directory not empty
> 1 block
> $ cpio -iu <a.cpio
> cpio: cannot remove current a/bb: Directory not empty
> 1 block
> $ # Never replaces an existing dir

Does having a directory where you want to put a file come up often?

What exactly is the advantage of the special case of removing an empty directory
where you want to put a file? Is this a common case? Is there a use case for it?
Is there any design idea here other than "match whatever gnu does"?

Is this something tar does? Sigh. Debian's tar -x is replacing a directory with
a file, but toybox's isn't. And my tar.c is dirty because of pending tar
--transform stuff... right, one more todo item.

> On Sat, Feb 27, 2021 at 3:03 PM Rob Landley <rob at landley.net
> <mailto:rob at landley.net>> wrote:
> 
>     On 2/20/21 8:34 AM, Yi-yo Chiang via Toybox wrote:
>     > *ping* (in case this got categorized as spam by gmail)
> 
>     It did, and so did this reply. :)
> 
>     > On Sun, Feb 14, 2021 at 9:13 PM Yi-Yo Chiang <yochiang at google.com
>     <mailto:yochiang at google.com>
>     > <mailto:yochiang at google.com <mailto:yochiang at google.com>>> wrote:
>     >
>     >     If -u is specified, then replace any existing files or directories.
> 
>     When does it replace directories? Do we replace a directory with a file, or a
>     file with a directory? (What if the directory to be replaced with a file isn't
>     empty, ala rm -r vs rmdir?)
> 
> 
> Honestly I'm not sure, and IIUC the spec doesn't say what to do either
> (https://pubs.opengroup.org/onlinepubs/7908799/xcu/cpio.html), so I assumed it
> is implementation specific?
> In this case I choose the behavior that is simplest to implement (IMO).

Replacing an existing file with another file makes sense. Replacing a file with
a directory makes a certain amount of sense because there's a common failure
mode: "cp file /usr/bin" will make a file if bin doesn't exist.

It's replacing a directory with a file that makes me go "huh"? For one thing
"rmdir" is not nearly as commonly used as "rm -r", and mkdir -p path/to/file
does not let you rmdir "path/to" (it's not empty, the single command is a
multi-command cleanup, unless you use rm -r)...

*shrug* If it's what the other one does we can do the same thing in the absence
of a spec.

> * If the *path* to be extracted already existed, then unlink/rmdir the path. *

That's the problem: you can't rmdir a path you can only rmdir a single empty
directory. and supporting an empty directory is like supporting a zero length
file but failing if the file has any contents.

> So yes we replace a file with a directory, and replace a directory (so long as
> rmdir() success, which means directory must be empty) with a file.

It's the "must be empty" part that strikes me as weird. Of course if we let it
rm -r stuff the command immediately becomes more _dangerous_, which is probably
why they don't do it. But extracting over something that disagrees whether files
or directories should be there is also kinda weird...

In the absence of a spec or a consistent design idea, I tend to look at what tar
and zip do and see if there's consistency there, and there sort of is but I need
to go fix tar, and I never got to implement zip because stunlocked with other
todo items, and I was trying to do sh.c but have been too busy to commit
anything to that in a month, and I'm so very tired...

>     Let's see...
> 
>       $ mkdir walrus
>       $ echo hello > walrus/file
>       $ echo walrus/file | cpio -H newc -o > blah.cpio
>       $ cpio -H newc -u -i < blah.cpio
>       1 block
>       $ mv walrus walrus2
>       $ touch walrus
>       $ cpio -H newc -i < blah.cpio
>       cpio: walrus/file: Cannot open: Not a directory
>       1 block
>       $ cpio -H newc -u -i < blah.cpio
>       cpio: walrus/file: Cannot open: Not a directory
>       1 block
>       $ cpio -H newc -dui < blah.cpio
>       cpio: `walrus' exists but is not a directory
>       cpio: walrus/file: Cannot open: Not a directory
>       1 block
> 
>     Nope, doesn't look like it. Let's see, will it replace a /dev node...? Yes it
>     does (rather than writing the contents to it. What's the default tar behavior
>     here... replaces the dev node. Which is not what solaris did back in college, I
>     note. Why do we want cpio's default behavior to differ from tar's here?)
> 
> I didn't follow.. Are you saying GNU tar replaces a device node, and Solaris tar
> writes to a device node?

Circa 1994 it would open whatever was there at that path, truncate it, and write
to it. Truncate was ignored on a block device, so... (It was used as a way to
image floppies.)

The current one doesn't do that, so I guess that's the expected behavior now,
which raises the question of why cpio needs -u when -u is the default for tar
(and doing -u on "older" files is the default for cpio, which is another head
scratcher. Seems kinda fancy given the userbase of this command...)

This is similar to how the initial initramfs cpio extract code would open the
existing node (in O_APPEND mode instead of O_TRUNC mode) and write data to it.
Not what the current one does, but I'm trying to work out the design reasons for
what it's doing. (Alas posix deleted the cpio spec, presumably because RPM and
initramfs are based on it and therefore Jorg Schilling felt it needed to die
because Linux cooties.)

> Do we want toybox cpio/tar to be feature-identical to GNU or Solaris'
> implementation? (I hope not, cause that feels like a bloat to code size)

I'm trying to figure out what the right thing to do is here. My first instinct
is "if tar and zip behave the same way, cpio probably should too", but I don't
want to break somebody's script.

What's the use case that motivated these changes again?

>     In any case, it looks like the expected behavior is to open with O_EXCL and
>     unlink if it failed. Under what circumstances DOES it replace an existing file
>     with a directory, or an existing directory with a file? (The tests you submitted
>     only replace files with other files.)
> 
> See my example at the start.
> Your commands didn't replace "walrus" because your archive extracts
> "walrus/file" but not "walrus".

I know why it didn't, yes.

> If you use "find ... | cpio -o" to create the test archive, then the existing
> "walrus" would be replaced by cpio when cpio is trying to mkdir(walrus).

Because there would then be _two_ entries in the cpio archive, and the directory
entry is guaranteed to come before the directory contents by the definition of
the find -depth option, which is a different test than the one I was running.

But if you tell cpio -d it creates leading directories where needed, and in
doing so it did _not_ remove conflicting files. And I don't understand why -d
ignores -u, other than it being an oversight. (That's why the last test I ran
combined -d and -u. Exact matches are treated differently than path components.)

By the way, I suspect that zapping directories was a side effect of zapping
symlinks, as an attack mitigation strategy. I had to care about this when
implementing tar, but can't find the test case or blog entry. It was while I was
working at Johnson Controls, I want to say early 2019? Hmmm, it wasn't
https://nvd.nist.gov/vuln/detail/CVE-2016-6321 and I remember it was something
busybox got wrong at the time...

Oh great, half of tests/tar.test is just notes-to-self not actual tests. So much
half-finished stuff I've never had time to finish. Well, can't do it right now...

>     I'd like to avoid the separate "stat" if necessary not just because it's extra
>     code but because it's a race condition between two non-atomic operations waiting
>     to happen. (I have no idea if the gap is exploitable, but I don't want to worry
>     about it if just not doing that is an option? And not doing that would be open
>     O_EXCL and on failure unlink and O_EXCL a second time.)
> 
> Yeah I totally agree. But I'd like to point out that in my first iteration of
> this code, I do guard the stat() with various conditions and nested if-else-ifs,

If all we care about is a single attempt to recover an occluded exact match the
logic sounds like:

  i = 0;
  do if (-1 != (fd = open(O_CREAT|O_EXCL))) break;
    while (!i++ && (!unlink(name) || !rmdir(name));
  if (fd == -1) {
    perror_msg_raw(name);
    continue;
  }

Where does the stat come in?

(Ok, if you chmod 000 a normal file the error message printed would be ENOTDIR
from the failing rmdir(), technically it should fall back to the rmdir only if
the unlink returns EISDIR, which the linux man page says is a non-posix value
returned by Linux but the BSD man page _also_ says is what unlink(2) returns so
I'm assuming mac does too.)

> hoping to only call stat() if EEXIST error happened.
> However the code quickly grew too complex and unreadable, with too many
> combinations to consider (file replace dir, dir replace file, dir replace dir,
> what if dir not empty? what about symlink and special files...), so I changed
> the implementation to the one you're seeing now.
> This is also the reason why I add the "if dir already exist, then just do chmod"
> shortcut, because race could happen between rmdir(a) and mkdir(a). 

A race resulting in a failure, you mean?

  $ ln -s broken nowhere
  $ ls -l nowhere
  lrwxrwxrwx 1 landley landley 6 Feb 28 03:09 nowhere -> broken
  $ mkdir nowhere
  mkdir: cannot create directory ‘nowhere’: File exists

How is this exploitable?

The reason I'm spending time on this thread instead of staring at the cpio code
is that implementation is easy, figuring out what it SHOULD do is hard. I still
don't know what use case motivated your changes. Your patch description says
what but not why, I have to reverse engineer an understanding of what you're
trying to accomplish from the conclusion you came to about the best way to
implement it.

> Another point is that when replacing an existing file, having a separate stat()
> would mean:
> stat(a) & unlink(a) & open(a, O_EXCL)
> but not stat()-ing beforehand would mean:
> open(a, O_EXCL) [fail] & stat(a) & ("a" is a directory ? rmdir : unlink)(a)
> & open(a, O_EXCL)

Isn't the open(a, O_EXCL) succeeding the common case though? And if that's the
first thing it tries and it succeeds...

> So having a separate "stat" doesn't necessarily mean an extra system call.
> If path doesn't exist then the stat() is an extra syscall, otherwise the stat()
> saves an extra syscall. 

Mostly it's just the possibility that the thing I statted is not the thing I
acted upon a moment later making me shy away from it as a first choice for
implementation. I prefer not to have to reason about what _could_ happen in that
gap if there's another way (such as the open saying it didn't work and then
unlink saying it didn't work so you have to rmdir; the first common case didn't
succeed and the second common case didn't succeed so then try the third. Does a
failed unlink attempt take longer than a stat?)

I note that the obvious approach is to wrap the whole if/else staircase with a
retry loop (otherwise there's a goto), which makes the patch look big because of
the extra indentation layer (which is why git diff supports -b, and yes I've
done that on patches to see what people actually did...)

>     (You'll notice the existing stat was an fstat() on an already open file
>     descriptor, and you're adding an lstat on a path which could get swapped out
>     afterwards. In general I try not to do that because it seems racy with programs
>     dropping symlinks based on inotify triggers. Yeah you've sprayed the system down
>     with selinux but I don't want to worry about it at a design level...)
> 
>     Unfortunately I haven't had the spare focus this week to properly wrap my brain
>     around what I want to happen here. I finally got a chance to look at it the end
>     of the day friday, and just wanted to say why I haven't applied it as-is yet.
>     I'll try to redo it this weekend, but first I need to confirm that open(O_EXCL)
>     is sufficient error detection and what we do about conflicting directory
>     creation (is there a case where -u DOES apply to -d?) 
> 
> I hope not, let's keep it simple...

It looks like only exact matches of empty directories get rmdir()ed (and really
it's unlinkat(AT_REMOVEDIR) and why can unlink remove files, dev nodes,
symlinks, sockets, and fifos, but directories have their own remove call (even
though that call fails if the directory isn't empty)? Why is that again? I'm too
tired for this to make sense right now...

>     and adding tests for the
>     corner cases limning in the design decisions.
>  
> I can add more tests once we decide the desired behavior.
Tests are fiddly. I really HATE when TEST_HOST gives spurious irreproducible
failures:

FAIL: tar sparse without overflow
echo -ne '' | tar c --owner root --group root --mtime @1234567890 --sparse fweep
| SUM 3

--- expected	2021-02-28 09:28:39.518369796 +0000
+++ actual	2021-02-28 09:28:39.526369797 +0000
@@ -1 +1 @@
-e1560110293247934493626d564c8f03c357cec5
+ce3f8e6b10723ce4d85f06c606bc634a9db23be0
.singlemake:1781: recipe for target 'test_tar' failed
make: *** [test_tar] Error 1

Smells like maybe clock edge overflow of a second ticking over and winding up
altering a header field? But I didn't grab the artifacts out of generated/ to
see what it was and it refuses to do it again...

Anyway, added a new test to tar and made it unlink an empty directory where it
wants to put a file. If the result of all this is "cpio should work like tar
when files/directories conflict with archive contents" then that's easy to
implement. If not, I'd like to understand _why_ not.

>     I need to properly wrap my head around this, but haven't been at my best the
>     past couple weeks. Slightly overstretched. Sorry 'bout that.
> 
> Nah it's cool. Do take care and take your time and don't feel burned out. 

Too late, but it's not because of this.

I _should_ just sit down and work out what to do and implement it instead of
having email threads that boil down to "I haven't scraped up enough brain to
work through all these corner cases myself yet", but I've been tired and
headachey for a few days now. I'd blame cedar pollen but my wife (in
minneapolis) has been nauseous for 2 days and apparently it's going around?

  https://twitter.com/catacalypto/status/1365538065184251905

To be honest I expect I'm still limp from the blizzard last week, which I
_shouldn't_ be but 2020 kinda burned through my reserves. Just because the
people who tried to violently overthrow the government last month have
deregulated infrastructure in my state to the point civilization literally
collapsed around here last week, and yesterday a jewish friend broke down
sobbing because the CPAC conference's stage is literally shaped like a nazi
symbol...

But there's good news: my house _didn't_ get flooded this time (after the first
4 times my wife has outright PTSD about that but she's been getting her
doctorate in another state for years and we haven't been able to see each other
in person since the pandemic started). The boil water advisory here was lifted
all the way back on wednesday, our power company isn't one of the ones charging
5 figure electricity bills (instead they say they'll spread it out over the next
10 years), and as of yesterday the local grocery store has restocked over half
its shelves. Plus the Johnson and Johnson one dose vaccine got approved, which
is the one I've been rooting for since brexit derailed availability of the
oxford one. If everything goes well I'm hoping I can leave the house and go
inside another building (other than the grocery store down the street) by maybe
september. (I still have the phobia of needles resulting from a nurse wheeling
in a tray of 24 needles for allergy testing when I was 7 and tying me up in a
sheet when I freaked out about it until I threw up on her. Ever since I've been
able to make my gums bleed by thinking to hard about needles, but I can usually
psych myself up to get an injection without losing consciousness given about
half an hour. So I have that to look forward to, but this is part of the reason
I really want the one dose version instead of the two dose version...)

Anyway, weekend! In theory this means spare cycles to throw at the toybox pull
request and bug report backlog (instead of working on the shell I've been trying
to finish, but nobody except me uses that and other people use the stuff they're
sending me patches for, so...)

Rob

P.S. Sorry I haven't been more focused. And Pascal's Apology about the length of
the email:
https://www.npr.org/sections/13.7/2014/02/03/270680304/this-could-have-been-shorter