<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Sep 30, 2022 at 1:35 AM Rob Landley <<a href="mailto:rob@landley.net">rob@landley.net</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On 9/28/22 10:41, enh wrote:<br>
> On Tue, Sep 27, 2022 at 11:32 PM Rob Landley <<a href="mailto:rob@landley.net" target="_blank">rob@landley.net</a>> wrote:<br>
> No, "flags=" is not how sed command syntax normally works, thanks for asking.<br>
<br>
Sigh.<br>
<br>
> So now I'm starting at this going:<br>
> <br>
> 1) Is rsh only the default when there are no scope flags? As in if I tell it<br>
> scope s should that switch off r and h?<br>
> <br>
> 2) Does flags= interact with the block logic? If I do<br>
> <br>
> --transform '{flags=S;s/potato/blah/;};s/a/b/'<br>
> <br>
> should the second transform apply to symlink type?<br>
> <br>
> 3) If how about jumps? If I b to a :label past a flags= statement does that<br>
> prevent the flag change from taking effect? (Is this a parse time attribute or a<br>
> runtime attribute?)<br>
<br>
I'm defining that as a runtime attribute, so the individual s/// command's RSH<br>
get masked against the default RSH which is "flags=rsh" when nobody said flags=<br>
because resolving that at parse time just feels wrong.<br>
<br>
> remember that GNU tar explicitly doesn't support _sed_ --- it supports something<br>
> like looks kind of like the sed 's' command.<br>
<br>
And that something is... really stupid.<br>
<br>
First of all, I verified that "rsh" remain the default even when one or more is<br>
specified, so the lower case ones only switch things back on when they've been<br>
disabled by an upper case letter. (So why even have the lower case ones? No<br>
idea! An individual pattern doesn't need to toggle them multiple times, and if<br>
the flags= one resets them each time you could do a blank flags=; to reset it<br>
back to normal by simply not SPECIFYING any of the upper case letters. Why did<br>
they do it the way they did it?)<br>
<br>
But it's stranger than that. Let's whip up a directory with a regular file, a<br>
symlink, and a hard link:<br>
<br>
$ mkdir uno<br>
$ ln -s tres uno/dos<br>
$ touch uno/tres<br>
$ ln uno/tres uno/quatro<br>
$ ls -l uno<br>
total 0<br>
lrwxrwxrwx 1 landley landley 4 Sep 30 02:34 dos -> tres<br>
-rw-r--r-- 2 landley landley 0 Sep 30 02:34 quatro<br>
-rw-r--r-- 2 landley landley 0 Sep 30 02:34 tres<br>
<br>
So what does "S" actually suppress?<br>
<br>
$ tar c uno --xform 's/uno/one/S;s/dos/two/S;s/tres/three/S;s/quatro/four/S' |<br>
tar tv<br>
drwxr-xr-x landley/landley 0 2022-09-30 02:34 one/<br>
-rw-r--r-- landley/landley 0 2022-09-30 02:34 one/four<br>
lrwxrwxrwx landley/landley 0 2022-09-30 02:34 one/two -> tres<br>
hrw-r--r-- landley/landley 0 2022-09-30 02:34 one/three link to one/four<br>
<br>
Answer: the symlink target, only. How about H?<br>
<br>
$ tar c uno --xform 's/uno/one/H;s/dos/two/H;s/tres/three/H;s/quatro/four/H' |<br>
tar tv<br>
drwxr-xr-x landley/landley 0 2022-09-30 02:34 one/<br>
-rw-r--r-- landley/landley 0 2022-09-30 02:34 one/four<br>
lrwxrwxrwx landley/landley 0 2022-09-30 02:34 one/two -> three<br>
hrw-r--r-- landley/landley 0 2022-09-30 02:34 one/three link to uno/quatro<br>
<br>
The hardlink target. How about R?<br>
<br>
$ tar c uno --xform 's/uno/one/R;s/dos/two/R;s/tres/three/R;s/quatro/four/R' |<br>
tar tv<br>
drwxr-xr-x landley/landley 0 2022-09-30 02:34 uno/<br>
-rw-r--r-- landley/landley 0 2022-09-30 02:34 uno/quatro<br>
lrwxrwxrwx landley/landley 0 2022-09-30 02:34 uno/dos -> three<br>
hrw-r--r-- landley/landley 0 2022-09-30 02:34 uno/tres link to one/four<br>
<br>
Now here's the insane part: we have just created a BROKEN HARDLINK. I didn't<br>
think that was an option, AND YET.<br>
<br>
Even though tres and quatro are equivalent hardlinks to each other (both<br>
dentries point to the same inode, you can't even _track_ which was created<br>
"first" because atime, ctime, and mtime are stored in the _inode_ that they<br>
_share_), the first one encountered is stored as a regular file and the<br>
subsequent ones are stored as a reference to that first regular file. And if<br>
you're doing a directory traversal, "first" is filesystem hash order. (You'll<br>
note I created tres first, but quatro was stored as the file and tres as the<br>
link, because ext4. Yes I could micromanage it by supplying filenames on the<br>
command line, and will probably have to if I want consistent tests.)<br>
<br>
Note: this isn't my implementation! This is all using the devuan host tar!<br>
<br>
What... what were they TRYING to do? What would happen if you tried to extract<br>
that tarball...<br>
<br>
tar: one/three: Cannot hard link to ‘uno/quatro’: No such file or directory<br>
tar: Exiting with failure status due to previous errors<br>
<br>
An error is what would happen. Which sort of implies that tar can arbitrarily<br>
hardlink to an existing host file? Which is just a WEALTH of warm fuzzies from a<br>
security standpoint...<br>
<br>
> The sad part is I strongly suspect I'm putting more thought into this than the<br>
> tar developers did, but I kind of need to know the answers to do it RIGHT.<br>
> <br>
> "any similarity to actual sed, living or dead, is entirely coincidental".<br>
> <br>
> tbh, i think all the choices were bad here:<br>
<br>
When the FSF and/or the gnu project are involved, definitely.<br>
<br>
> 1. invent a new format. now you have two problems.<br>
<br>
Can't vi do search and replace without explicitly being sed syntax?<br></blockquote><div><br></div><div>not that i know of? 1,$s///g is the vi syntax, no? call it "ed" rather than "sed" if you prefer, but it's basically the same, no?</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
> 2. reuse sed entirely. now you have the problems you've been dealing with. (plus<br>
> a lot of curious "what does <thing> even mean?" questions, because sed is too<br>
> general for this use case.<br>
<br>
If you're regexing so your extract tries to overwrite /etc/passwd or something<br>
then allowing arbitrary inputs into the regex pattern is already verboten-ish.<br>
<br>
The gnu/dammit sed has a --sandbox mode that disables the r/w/e commands. What's<br>
"e"? Execute the pattern space as a shell command! Every invocation of sed can<br>
rm -rf your home directory! Because gnu!<br>
<br>
(I did not implement 'e' in toybox's sed, or in busybox's sed. I checked to see<br>
if they'd added it and as of August the answer is no... and I'm still listed as<br>
the maintainer at the top of that file. The last commit attributed to me was<br>
2009. Backing away slowly...)<br>
<br>
> luckily i doubt we'll ever have to answer those<br>
> questions, because i doubt anything esoteric is likely to be used.<br>
<br>
Famous. Last. Words.<br></blockquote><div><br></div><div>well, "you're doing this to yourself". GNU tar doesn't support any of this, busybox tar even less. so it's only toybox tar where someone _could_ get themselves into a mess with this in the first place. (and tbh, i still haven't seen an actual motivation beyond "orthogonality". which is a fine goal all other things being equal, but "massive added complexity", "ability to construct tar commands that no-one can read [because hardly anyone knows sed beyond s/// any more]", "interoperability issues with gnu/busybox", and "possible unintended consequences" all sound like reasons to believe all other things are _not_ equal here :-) )</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
> despite<br>
> hyrum's law, i don't really know anyone except you who writes sed more complex<br>
> than a single s command, and issues with trying to run your sed programs on<br>
> macOS suggest that nothing beyond a single s command is portable to different<br>
> sed implementations anyway!)<br>
<br>
The mac sed implementation is crap, but then the mac kernel is kind of crap-ish.<br></blockquote><div><br></div><div>mac sed is just BSD sed. it's good enough for anyone who's not as good at sed as you, which is roughly "everyone" :-)</div><div><br></div><div>i see a lot more sophisticated use of od/hexdump/xxd from people than i do sed!</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
Rather a lot of mac tech is basically what would happen if Microsoft's design<br>
aesthetic could be competently implemented, and if you scratch the surface you<br>
get techies complaining about the backlight power pin being adjacent to the<br>
video signal pin so when you open and close the lid enough times to crease the<br>
ribbon cable your GPU fries instantly and can't be fixed without replacing the<br>
board. (I did not make that up:<br>
<a href="https://boards.rossmanngroup.com/threads/820-00875-no-backlight-no-display.60224/" rel="noreferrer" target="_blank">https://boards.rossmanngroup.com/threads/820-00875-no-backlight-no-display.60224/</a>)<br>
<br>
> 3. rewrite that one command from sed. now you have duplication, and potential<br>
> skew between "actual s command" and "fake s command".<br>
> <br>
> option 3 is not obviously a bad choice given the issues with the alternatives,<br>
> but it's a bad fit for anyone trying to do option 2 instead.<br>
<br>
Implementing behavior is easy. Figuring out what the behavior should BE is hard.<br></blockquote><div><br></div><div>tbh, this is where i like the (usual) rob landley toybox philosophy of "i'll implement it when we have a motivating example of someone trying to get something done with it, not just because it's mentioned in the docs". i think that's a great pragmatic philosophy (in a world dominated by dogmatic philosophies, to which group "orthogonality" -- for all its merits at times -- tends to belong). it also has the nice side-effect of letting reality guide where to spend your time, because it means you're focusing on things that users demonstrably need rather than stuff that someone might want someday.</div><div><br></div><div>with this tar sed stuff i feel like i'm watching a man drill holes in his own head, all the time crying that it hurts :-)</div><div><br></div><div>(though you are, i think, collecting a hell of a lot of circumstantial evidence that the original implementors didn't think this through. but to me that says "so neither should you" --- just do the minimum, assume the weird shit is as useless as it appears, move on with your life until/unless someone comes along who actually does need more. a motivating example often makes things clearer. the lack of one is often a sign you were right to ignore the whole mess :-) )</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
And turning it into proper tests cases... (If I don't say "v" then I just get<br>
one/three without "link to uno/quatro", but I add the v it gives me irrelevant<br>
user and timestamp info, although I guess I've already got test invocations that<br>
regularize all that....)<br></blockquote><div><br></div><div>/me wonders how much of this gnu behavior is even deliberate versus accidental, and thus likely to be the kind of test that suffers from debian version skew if/when anyone actually tries to use the gnu version.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
Rob<br>
</blockquote></div></div>