[Toybox] [PATCH] cpio: Implement -u (copy unconditionally)

Rob Landley rob at landley.net
Mon Mar 1 17:57:25 PST 2021


On 2/28/21 12:29 PM, Yi-yo Chiang wrote:
> (I'll try to keep it short)
> 
> My original motivation to send this patch is that my coworker found out (when he
> was working with initramfs) that "cpio -u" behaves differently on toybox
> compared to the GNU's implementation.

"What was the actual problem?" "My coworker noticed this existed."

That's... not a use case?

> (my original email, in case it was flagged as spam...:
> http://lists.landley.net/pipermail/toybox-landley.net/2021-February/012270.html)
> In short, I found that toybox cpio doesn't exactly behave like "-u" is always
> on.

Not behaving exactly like gnu is not in itself a bug. (An undocumented deviation
from posix is, but a lot of times the fix is documenting it in the "deviations
from posix" section of the command's source...)

> I'll write the repro steps here again, since I discovered another oddity
> when "-d" is specified:
> 
> (testing toybox cpio)
> $ mkdir a
> $ echo "a" | cpio -H newc -o >a.cpio
> $ toybox cpio -iu <a.cpio
> cpio: a: File exists
>
> (toybox shouldn't report EEXIST if the directory to be created already exist)

Since I can't get a clear explanation of what you want, I just implemented the
simple thing I wanted to do at the start. Commit f1be076b52ad.

By the way, when I trimmed down your patch file to add just the tests, I found
out that if you leave the trailing

  diff --git a/toys/posix/cpio.c b/toys/posix/cpio.c
  index 31c777c9..795f890c 100644

lines of a hunk but delete the rest debian's patch will announce that it's
patching that file, do nothing, and not have any sort of error about it. I don't
think mine does that. Is mine behaving differently here also a bug by definition
because it doesn't match the gnu behavior exactly?

Oh, and in your patch:

     } else if (S_ISLNK(mode)) {
       data = strpad(afd, size, 0);
-      if (!test) err = symlink(data, name);
+      if (!test) {
+        err = symlink(data, name);
+        // Can't get a filehandle to a symlink, so do special chown
+        if (!err && !geteuid() && !FLAG(no_preserve_owner))
+          err = lchown(name, uid, gid);
+      }
       free(data);
-      // Can't get a filehandle to a symlink, so do special chown
-      if (!err && !geteuid() && !FLAG(no_preserve_owner))
-        err = lchown(name, uid, gid);

err is initialized to zero each time through the loop, nothing sets it before
the call to symlink that is skipped by if (!test), so when !test the if (!err)
has to trigger, that's why it didn't have to be part of the earlier block. (Yes
I read through your patch to see if you found any actual bugs you didn't
mention, that's why this is so much work. I want to understand what's going on.)

> $ echo "a/" | cpio -H newc -o >a.cpio
> $ rm -rf a
> $ toybox cpio -iud <a.cpio
> cpio: a/: File exists
> 
> ("-d" creates the leading directory of "a/", which is "a", which causes cpio to
> complain that "a" already exists when it tries to mkdir("a/")...)

Same error with and without -d so the problem is -d?

Suppressing mkdir errors with -u should also coincidentally fix this.


> My goal was to make the two cases above to not report any error (the patch I
> sent only addressed the first case though),

Should your most recent case still set the existing directories to the
ownership, permissions, and timestamp from the archive? Or just silently ignore
the conflict?

You are not expressing a coherent design idea. What should it do and why?

The OTHER thing my commit doesn't handle is if an undeletable file exists where
a directory should go (owned by another user for example) it won't report error,
but the common use case of cpio is to mkdir and then put something in the dir
and the later one would still barf. (I check for EEXIST but even the man page
says "EEXIST pathname already exists (not necessarily as a directory)." so that
wouldn't help.)

FYI, I'm more comfortable with "optionally attempt to clear a path and report
failure only on later attempts to use it" than "rely on stat's contents in a
stat-wait-open usage pattern to mean anything about what got opened" because the
second is a common security pattern that there are static analyzers searching
for to exploit, and the first isn't that I'm aware of. The failure case when -u
doesn't work is "act as if they didn't supply -u", which seems reasonable to me.

> and a follow-up question is that do we implement different behaviors with and
> without "-u"?

I wasn't doing so, I was ignoring -u.

If the complaint is "tar and zip behave one way but cpio doesn't" I find that a
compelling argument. If "gnu cpio needs -H newc to produce usable archives but
toybox does so by default, so change toybox to require -H newc even though it
can't produce any other archive format because otherwise their behavior
differs", I do not find that a compelling argument. A difference is not
inherently a bug.

A difference that breaks existing code that works in one context but does not
work in another is a bug, even what that difference is clearly stupid and we
need "bug compatibility" to get there.

My go-to example is commit 32b3587af261 where perl had a miswritten NOP sed
expression that gnu silently ignored but mine reported as an error. What the
perl devs wrote clearly wasn't what they MEANT, and if there had been an error
they would have spotted this bug and fixed the regex before they shipped. But
they didn't, so I changed mine to accept the bug so it could run the perl build,
even though the resulting behavior was "wrong" something depended on it.

> I'm not trying to make the behavior of toybox cpio match other commands or
> implementations.

Then what ARE you trying to do? (Do you have a use case?)

> Though it's a welcome change IMO if it makes subcommands behave more
> consistently. For example making toybox cpio behave similar to toybox tar.

That was my original idea, but tar has -k to switch it _off_ which cpio does not
have, and inventing a new option means that a toybox script using it would break
with "illegal option" when run with busybox's version...

(If you want "portable", don't have conflicting files where you're extracting
to. There's already version skew in the kernel's initramfs extractor about that...)

Anyway, patch checked in. I'm gonna stop the thread here unless you want to send
me more test cases.

Rob



More information about the Toybox mailing list