<div dir="ltr"><div>(I kind of dozed off... so break off some questions / answers to another mail)</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sun, Feb 28, 2021 at 6:41 PM Rob Landley <<a href="mailto:rob@landley.net">rob@landley.net</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On 2/27/21 10:18 AM, Yi-yo Chiang wrote:<br>
> Do we want toybox cpio/tar to be feature-identical to GNU or Solaris'<br>
> implementation? (I hope not, cause that feels like a bloat to code size)<br>
<br>
I'm trying to figure out what the right thing to do is here. My first instinct<br>
is "if tar and zip behave the same way, cpio probably should too", but I don't<br>
want to break somebody's script.<br>
<br>
What's the use case that motivated these changes again?<br></blockquote><div><br></div><div>Answered in the previous reply.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
> In any case, it looks like the expected behavior is to open with O_EXCL and<br>
> unlink if it failed. Under what circumstances DOES it replace an existing file<br>
> with a directory, or an existing directory with a file? (The tests you submitted<br>
> only replace files with other files.)<br>
> <br>
> See my example at the start.<br>
> Your commands didn't replace "walrus" because your archive extracts<br>
> "walrus/file" but not "walrus".<br>
<br>
I know why it didn't, yes.<br>
<br>
> If you use "find ... | cpio -o" to create the test archive, then the existing<br>
> "walrus" would be replaced by cpio when cpio is trying to mkdir(walrus).<br>
<br>
Because there would then be _two_ entries in the cpio archive, and the directory<br>
entry is guaranteed to come before the directory contents by the definition of<br>
the find -depth option, which is a different test than the one I was running.<br>
<br>
But if you tell cpio -d it creates leading directories where needed, and in<br>
doing so it did _not_ remove conflicting files. And I don't understand why -d<br>
ignores -u, other than it being an oversight. (That's why the last test I ran<br>
combined -d and -u. Exact matches are treated differently than path components.)<br></blockquote><div><br></div><div>I think tar doesn't remove conflicting leading directories either? Not that this means anything, but if we want to make cpio be similar to tar then we would want to keep this behavior.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
By the way, I suspect that zapping directories was a side effect of zapping<br>
symlinks, as an attack mitigation strategy. I had to care about this when<br>
implementing tar, but can't find the test case or blog entry. It was while I was<br>
working at Johnson Controls, I want to say early 2019? Hmmm, it wasn't<br>
<a href="https://nvd.nist.gov/vuln/detail/CVE-2016-6321" rel="noreferrer" target="_blank" class="cremed">https://nvd.nist.gov/vuln/detail/CVE-2016-6321</a> and I remember it was something<br>
busybox got wrong at the time...<br>
<br>
Oh great, half of tests/tar.test is just notes-to-self not actual tests. So much<br>
half-finished stuff I've never had time to finish. Well, can't do it right now...<br>
<br>
> I'd like to avoid the separate "stat" if necessary not just because it's extra<br>
> code but because it's a race condition between two non-atomic operations waiting<br>
> to happen. (I have no idea if the gap is exploitable, but I don't want to worry<br>
> about it if just not doing that is an option? And not doing that would be open<br>
> O_EXCL and on failure unlink and O_EXCL a second time.)<br>
> <br>
> Yeah I totally agree. But I'd like to point out that in my first iteration of<br>
> this code, I do guard the stat() with various conditions and nested if-else-ifs,<br>
<br>
If all we care about is a single attempt to recover an occluded exact match the<br>
logic sounds like:<br>
<br>
i = 0;<br>
do if (-1 != (fd = open(O_CREAT|O_EXCL))) break;<br>
while (!i++ && (!unlink(name) || !rmdir(name));<br>
if (fd == -1) {<br>
perror_msg_raw(name);<br>
continue;<br>
}<br>
<br>
Where does the stat come in?<br>
<br>
(Ok, if you chmod 000 a normal file the error message printed would be ENOTDIR<br>
from the failing rmdir(), technically it should fall back to the rmdir only if<br>
the unlink returns EISDIR, which the linux man page says is a non-posix value<br>
returned by Linux but the BSD man page _also_ says is what unlink(2) returns so<br>
I'm assuming mac does too.)<br></blockquote><div><br></div><div>In such case a separate "stat and unlink/rmdir" would produce a better error message?</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
> hoping to only call stat() if EEXIST error happened.<br>
> However the code quickly grew too complex and unreadable, with too many<br>
> combinations to consider (file replace dir, dir replace file, dir replace dir,<br>
> what if dir not empty? what about symlink and special files...), so I changed<br>
> the implementation to the one you're seeing now.<br>
> This is also the reason why I add the "if dir already exist, then just do chmod"<br>
> shortcut, because race could happen between rmdir(a) and mkdir(a). <br>
<br>
A race resulting in a failure, you mean?<br>
<br>
$ ln -s broken nowhere<br>
$ ls -l nowhere<br>
lrwxrwxrwx 1 landley landley 6 Feb 28 03:09 nowhere -> broken<br>
$ mkdir nowhere<br>
mkdir: cannot create directory ‘nowhere’: File exists<br>
<br>
How is this exploitable?<br>
<br>
The reason I'm spending time on this thread instead of staring at the cpio code<br>
is that implementation is easy, figuring out what it SHOULD do is hard. I still<br>
don't know what use case motivated your changes. Your patch description says<br>
what but not why, I have to reverse engineer an understanding of what you're<br>
trying to accomplish from the conclusion you came to about the best way to<br>
implement it.<br>
<br>
> Another point is that when replacing an existing file, having a separate stat()<br>
> would mean:<br>
> stat(a) & unlink(a) & open(a, O_EXCL)<br>
> but not stat()-ing beforehand would mean:<br>
> open(a, O_EXCL) [fail] & stat(a) & ("a" is a directory ? rmdir : unlink)(a)<br>
> & open(a, O_EXCL)<br>
<br>
Isn't the open(a, O_EXCL) succeeding the common case though? And if that's the<br>
first thing it tries and it succeeds...<br>
<br>
> So having a separate "stat" doesn't necessarily mean an extra system call.<br>
> If path doesn't exist then the stat() is an extra syscall, otherwise the stat()<br>
> saves an extra syscall. <br>
<br>
Mostly it's just the possibility that the thing I statted is not the thing I<br>
acted upon a moment later making me shy away from it as a first choice for<br>
implementation. I prefer not to have to reason about what _could_ happen in that<br>
gap if there's another way (such as the open saying it didn't work and then<br>
unlink saying it didn't work so you have to rmdir; the first common case didn't<br>
succeed and the second common case didn't succeed so then try the third. Does a<br>
failed unlink attempt take longer than a stat?)<br>
<br>
I note that the obvious approach is to wrap the whole if/else staircase with a<br>
retry loop (otherwise there's a goto), which makes the patch look big because of<br>
the extra indentation layer (which is why git diff supports -b, and yes I've<br>
done that on patches to see what people actually did...)<br>
<br>
> (You'll notice the existing stat was an fstat() on an already open file<br>
> descriptor, and you're adding an lstat on a path which could get swapped out<br>
> afterwards. In general I try not to do that because it seems racy with programs<br>
> dropping symlinks based on inotify triggers. Yeah you've sprayed the system down<br>
> with selinux but I don't want to worry about it at a design level...)<br>
> <br>
> Unfortunately I haven't had the spare focus this week to properly wrap my brain<br>
> around what I want to happen here. I finally got a chance to look at it the end<br>
> of the day friday, and just wanted to say why I haven't applied it as-is yet.<br>
> I'll try to redo it this weekend, but first I need to confirm that open(O_EXCL)<br>
> is sufficient error detection and what we do about conflicting directory<br>
> creation (is there a case where -u DOES apply to -d?) <br>
> <br>
> I hope not, let's keep it simple...<br>
<br>
It looks like only exact matches of empty directories get rmdir()ed (and really<br>
it's unlinkat(AT_REMOVEDIR) and why can unlink remove files, dev nodes,<br>
symlinks, sockets, and fifos, but directories have their own remove call (even<br>
though that call fails if the directory isn't empty)? Why is that again? I'm too<br>
tired for this to make sense right now...<br></blockquote><div><br></div><div>Why do rmdir() and unlink() have to be two different calls?<br></div><div>I have the same question ever since college, and it's still puzzling me today. Over the years I learned to just live with it and stop bugging other people with this kind of question. I guess this is what socialization feels like.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
> and adding tests for the<br>
> corner cases limning in the design decisions.<br>
> <br>
> I can add more tests once we decide the desired behavior.<br>
Tests are fiddly. I really HATE when TEST_HOST gives spurious irreproducible<br>
failures:<br>
<br>
FAIL: tar sparse without overflow<br>
echo -ne '' | tar c --owner root --group root --mtime @1234567890 --sparse fweep<br>
| SUM 3<br>
--- expected 2021-02-28 09:28:39.518369796 +0000<br>
+++ actual 2021-02-28 09:28:39.526369797 +0000<br>
@@ -1 +1 @@<br>
-e1560110293247934493626d564c8f03c357cec5<br>
+ce3f8e6b10723ce4d85f06c606bc634a9db23be0<br>
.singlemake:1781: recipe for target 'test_tar' failed<br>
make: *** [test_tar] Error 1<br>
<br>
Smells like maybe clock edge overflow of a second ticking over and winding up<br>
altering a header field? But I didn't grab the artifacts out of generated/ to<br>
see what it was and it refuses to do it again...<br>
<br>
Anyway, added a new test to tar and made it unlink an empty directory where it<br>
wants to put a file. If the result of all this is "cpio should work like tar<br>
when files/directories conflict with archive contents" then that's easy to<br>
implement. If not, I'd like to understand _why_ not.<br></blockquote><div><br></div><div>Mail is long, and I kind of lose track of what the topic is again? Anyway if this (should cpio be similar to tar in resolving name conflicts?) is ever a question, then my answer would be "yes, why not?".</div><div>Ah right, the problem is that cpio (toybox) right now only replaces a conflicting name when both the name to be extracted and the name to be replaced are regular files.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
> I need to properly wrap my head around this, but haven't been at my best the<br>
> past couple weeks. Slightly overstretched. Sorry 'bout that.<br>
> <br>
> Nah it's cool. Do take care and take your time and don't feel burned out. <br>
<br>
Too late, but it's not because of this.<br>
<br>
I _should_ just sit down and work out what to do and implement it instead of<br>
having email threads that boil down to "I haven't scraped up enough brain to<br>
work through all these corner cases myself yet", but I've been tired and<br>
headachey for a few days now. I'd blame cedar pollen but my wife (in<br>
minneapolis) has been nauseous for 2 days and apparently it's going around?<br>
<br>
<a href="https://twitter.com/catacalypto/status/1365538065184251905" rel="noreferrer" target="_blank" class="cremed">https://twitter.com/catacalypto/status/1365538065184251905</a><br>
<br>
To be honest I expect I'm still limp from the blizzard last week, which I<br>
_shouldn't_ be but 2020 kinda burned through my reserves. Just because the<br>
people who tried to violently overthrow the government last month have<br>
deregulated infrastructure in my state to the point civilization literally<br>
collapsed around here last week, and yesterday a jewish friend broke down<br>
sobbing because the CPAC conference's stage is literally shaped like a nazi<br>
symbol...<br>
<br>
But there's good news: my house _didn't_ get flooded this time (after the first<br>
4 times my wife has outright PTSD about that but she's been getting her<br>
doctorate in another state for years and we haven't been able to see each other<br>
in person since the pandemic started). The boil water advisory here was lifted<br>
all the way back on wednesday, our power company isn't one of the ones charging<br>
5 figure electricity bills (instead they say they'll spread it out over the next<br>
10 years), and as of yesterday the local grocery store has restocked over half<br>
its shelves. Plus the Johnson and Johnson one dose vaccine got approved, which<br>
is the one I've been rooting for since brexit derailed availability of the<br>
oxford one. If everything goes well I'm hoping I can leave the house and go<br>
inside another building (other than the grocery store down the street) by maybe<br>
september. (I still have the phobia of needles resulting from a nurse wheeling<br>
in a tray of 24 needles for allergy testing when I was 7 and tying me up in a<br>
sheet when I freaked out about it until I threw up on her. Ever since I've been<br>
able to make my gums bleed by thinking to hard about needles, but I can usually<br>
psych myself up to get an injection without losing consciousness given about<br>
half an hour. So I have that to look forward to, but this is part of the reason<br>
I really want the one dose version instead of the two dose version...)<br></blockquote><div><br></div><div>Can't say I understand, because I don't live in the states, thus saying that would sound like a hypocrite.</div><div>I'm from a country that doesn't have that big of a pollen issue (I think the trees here are different?), we always boil our water even pre-COVID, no blizzard but typhoons, and we don't know what's a COVID vaccine :)</div><div>I think rest more, work less, and eat balancedly and perhaps headaches would go away? (But who am I to say, I'm working too much already...)</div><div> </div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
Anyway, weekend! In theory this means spare cycles to throw at the toybox pull<br>
request and bug report backlog (instead of working on the shell I've been trying<br>
to finish, but nobody except me uses that and other people use the stuff they're<br>
sending me patches for, so...)<br>
<br>
Rob<br>
<br>
P.S. Sorry I haven't been more focused. And Pascal's Apology about the length of<br>
the email:<br>
<a href="https://www.npr.org/sections/13.7/2014/02/03/270680304/this-could-have-been-shorter" rel="noreferrer" target="_blank" class="cremed">https://www.npr.org/sections/13.7/2014/02/03/270680304/this-could-have-been-shorter</a><br>
</blockquote></div><div><br></div>^^^ I didn't finish this "apology" either, it's still too long!<br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><table width="90%" border="0" cellspacing="0" cellpadding="0" style="margin:0px;padding:0px;font-family:"Times New Roman";max-width:348px"><tbody style="margin:0px;padding:0px"><tr style="margin:0px;padding:0px"><td style="padding:0px"><table border="0" cellspacing="0" cellpadding="0" style="margin:0px;padding:20px 0px 0px"><tbody style="margin:0px;padding:0px"><tr style="margin:0px;padding:0px"><td valign="top" style="padding:0px 20px 0px 0px;vertical-align:top;border-right:1px solid rgb(213,213,213)"><img src="https://i.imgur.com/eGpkLls.png" width="200" height="64"><br></td><td style="padding:0px 0px 0px 20px"><table border="0" cellspacing="0" cellpadding="0" style="margin:0px;padding:0px"><tbody style="margin:0px;padding:0px"><tr style="margin:0px;padding:0px"><td colspan="2" style="font-family:Arial,Helvetica,Verdana,sans-serif;padding:1px 0px 5px;font-size:13px;line-height:13px;color:rgb(56,58,53);font-weight:700">Yi-yo Chiang</td></tr><tr style="margin:0px;padding:0px"><td colspan="2" style="font-family:Arial,Helvetica,Verdana,sans-serif;padding:0px 0px 5px;font-size:11px;line-height:13px;color:rgb(56,58,53)">Software Engineer</td></tr><tr style="margin:0px;padding:0px"><td colspan="2" style="font-family:Arial,Helvetica,Verdana,sans-serif;padding:0px 0px 5px;font-size:11px;line-height:13px;color:rgb(56,58,53)"><a href="mailto:yochiang@google.com" target="_blank" class="cremed">yochiang@google.com</a></td></tr><tr style="margin:0px;padding:0px"><td colspan="2" style="font-family:Arial,Helvetica,Verdana,sans-serif;padding:0px 0px 3px;font-size:11px;line-height:13px;color:rgb(3,112,248)"></td></tr></tbody></table></td></tr></tbody></table></td></tr></tbody></table></div></div></div>