[Toybox] [PATCH] cpio: Implement -u (copy unconditionally)

Sat Feb 27 08:18:54 PST 2021

Using different command to create the archive yields different result, I
used $ find dir | cpio -o ... to create the archive for testing:

(testing the behavior of GNU cpio)
$ mkdir a
$ echo blah > a/bb
$ find a | cpio -H newc -o >a.cpio
$ # Archive a.cpio creates a (dir) and a/bb (file)

$ # Test dir from archive replaces existing file
$ rm -rf a
$ touch a
$ cpio -i <a.cpio
cpio: a not created: newer or same age version exists
cpio: a/bb: Cannot open: Not a directory
1 block
$ # Failed to create a (dir)
$ touch -d yesterday a
$ cpio -i <a.cpio
1 block
$ # Interesting, if the timestamp of a (file) is older than the archive's a
(dir), then replacement happens
$ touch a
$ cpio -iu <a.cpio
1 block
$ # -u unlinks a (file) then mkdir(a)

$ # Test file from archive replaces existing empty dir
$ rm a/bb
$ mkdir a/bb
$ cpio -i <a.cpio
cpio: a/bb not created: newer or same age version exists
1 block
$ touch -d yesterday a/bb
$ cpio -i <a.cpio
1 block
$ rm a/bb
$ mkdir a/bb
$ cpio -iu <a.cpio
1 block
$ # Overall similar result when replacing existing file.

$ # Test when a/bb is not empty
$ rm -rf a
$ mkdir -p a/bb/ccc
$ cpio -i <a.cpio
cpio: a/bb not created: newer or same age version exists
1 block
$ touch -d yesterday a/bb
$ cpio -i <a.cpio
cpio: cannot remove current a/bb: Directory not empty
1 block
$ cpio -iu <a.cpio
cpio: cannot remove current a/bb: Directory not empty
1 block
$ # Never replaces an existing dir

On Sat, Feb 27, 2021 at 3:03 PM Rob Landley <rob at landley.net> wrote:

> On 2/20/21 8:34 AM, Yi-yo Chiang via Toybox wrote:
> > *ping* (in case this got categorized as spam by gmail)
>
> It did, and so did this reply. :)
>
> > On Sun, Feb 14, 2021 at 9:13 PM Yi-Yo Chiang <yochiang at google.com
> > <mailto:yochiang at google.com>> wrote:
> >
> >     If -u is specified, then replace any existing files or directories.
>
> When does it replace directories? Do we replace a directory with a file,
> or a
> file with a directory? (What if the directory to be replaced with a file
> isn't
> empty, ala rm -r vs rmdir?)
>

Honestly I'm not sure, and IIUC the spec doesn't say what to do either (
https://pubs.opengroup.org/onlinepubs/7908799/xcu/cpio.html), so I assumed
it is implementation specific?
In this case I choose the behavior that is simplest to implement (IMO).

* If the *path* to be extracted already existed, then unlink/rmdir the
path. *

So yes we replace a file with a directory, and replace a directory (so long
as rmdir() success, which means directory must be empty) with a file.

>
> Let's see...
>
>   $ mkdir walrus
>   $ echo hello > walrus/file
>   $ echo walrus/file | cpio -H newc -o > blah.cpio
>   $ cpio -H newc -u -i < blah.cpio
>   1 block
>   $ mv walrus walrus2
>   $ touch walrus
>   $ cpio -H newc -i < blah.cpio
>   cpio: walrus/file: Cannot open: Not a directory
>   1 block
>   $ cpio -H newc -u -i < blah.cpio
>   cpio: walrus/file: Cannot open: Not a directory
>   1 block
>   $ cpio -H newc -dui < blah.cpio
>   cpio: `walrus' exists but is not a directory
>   cpio: walrus/file: Cannot open: Not a directory
>   1 block
>
> Nope, doesn't look like it. Let's see, will it replace a /dev node...? Yes
> it
> does (rather than writing the contents to it. What's the default tar
> behavior
> here... replaces the dev node. Which is not what solaris did back in
> college, I
> note. Why do we want cpio's default behavior to differ from tar's here?)
>

I didn't follow.. Are you saying GNU tar replaces a device node, and
Solaris tar writes to a device node?
Do we want toybox cpio/tar to be feature-identical to GNU or Solaris'
implementation? (I hope not, cause that feels like a bloat to code size)

>
> In any case, it looks like the expected behavior is to open with O_EXCL and
> unlink if it failed. Under what circumstances DOES it replace an existing
> file
> with a directory, or an existing directory with a file? (The tests you
> submitted
> only replace files with other files.)
>

See my example at the start.
Your commands didn't replace "walrus" because your archive extracts
"walrus/file" but not "walrus".
If you use "find ... | cpio -o" to create the test archive, then the
existing "walrus" would be replaced by cpio when cpio is trying to
mkdir(walrus).

>
> I'd like to avoid the separate "stat" if necessary not just because it's
> extra
> code but because it's a race condition between two non-atomic operations
> waiting
> to happen. (I have no idea if the gap is exploitable, but I don't want to
> worry
> about it if just not doing that is an option? And not doing that would be
> open
> O_EXCL and on failure unlink and O_EXCL a second time.)
>

Yeah I totally agree. But I'd like to point out that in my first iteration
of this code, I do guard the stat() with various conditions and nested
if-else-ifs, hoping to only call stat() if EEXIST error happened.
However the code quickly grew too complex and unreadable, with too many
combinations to consider (file replace dir, dir replace file, dir replace
dir, what if dir not empty? what about symlink and special files...), so I
changed the implementation to the one you're seeing now.
This is also the reason why I add the "if dir already exist, then just do
chmod" shortcut, because race could happen between rmdir(a) and mkdir(a).

Another point is that when replacing an existing file, having a separate
stat() would mean:
stat(a) & unlink(a) & open(a, O_EXCL)
but not stat()-ing beforehand would mean:
open(a, O_EXCL) [fail] & stat(a) & ("a" is a directory ? rmdir : unlink)(a)
& open(a, O_EXCL)
So having a separate "stat" doesn't necessarily mean an extra system call.
If path doesn't exist then the stat() is an extra syscall, otherwise the
stat() saves an extra syscall.

> (You'll notice the existing stat was an fstat() on an already open file
> descriptor, and you're adding an lstat on a path which could get swapped
> out
> afterwards. In general I try not to do that because it seems racy with
> programs
> dropping symlinks based on inotify triggers. Yeah you've sprayed the
> system down
> with selinux but I don't want to worry about it at a design level...)
>
> Unfortunately I haven't had the spare focus this week to properly wrap my
> brain
> around what I want to happen here. I finally got a chance to look at it
> the end
> of the day friday, and just wanted to say why I haven't applied it as-is
> yet.
> I'll try to redo it this weekend, but first I need to confirm that
> open(O_EXCL)
> is sufficient error detection and what we do about conflicting directory
> creation (is there a case where -u DOES apply to -d?)

I hope not, let's keep it simple...

> and adding tests for the
> corner cases limning in the design decisions.
>

I can add more tests once we decide the desired behavior.

>
> I need to properly wrap my head around this, but haven't been at my best
> the
> past couple weeks. Slightly overstretched. Sorry 'bout that.
>

Nah it's cool. Do take care and take your time and don't feel burned out.

>
> Rob
> _______________________________________________
> Toybox mailing list
> Toybox at lists.landley.net
> http://lists.landley.net/listinfo.cgi/toybox-landley.net
>

-- 

Yi-yo Chiang
Software Engineer
yochiang at google.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.landley.net/pipermail/toybox-landley.net/attachments/20210228/bc87fb43/attachment.html>