[Toybox] I hate the GNU design aesthetic.
enh
enh at google.com
Fri Sep 30 08:10:22 PDT 2022
On Fri, Sep 30, 2022 at 1:35 AM Rob Landley <rob at landley.net> wrote:
> On 9/28/22 10:41, enh wrote:
> > On Tue, Sep 27, 2022 at 11:32 PM Rob Landley <rob at landley.net> wrote:
> > No, "flags=" is not how sed command syntax normally works, thanks
> for asking.
>
> Sigh.
>
> > So now I'm starting at this going:
> >
> > 1) Is rsh only the default when there are no scope flags? As in if I
> tell it
> > scope s should that switch off r and h?
> >
> > 2) Does flags= interact with the block logic? If I do
> >
> > --transform '{flags=S;s/potato/blah/;};s/a/b/'
> >
> > should the second transform apply to symlink type?
> >
> > 3) If how about jumps? If I b to a :label past a flags= statement
> does that
> > prevent the flag change from taking effect? (Is this a parse time
> attribute or a
> > runtime attribute?)
>
> I'm defining that as a runtime attribute, so the individual s/// command's
> RSH
> get masked against the default RSH which is "flags=rsh" when nobody said
> flags=
> because resolving that at parse time just feels wrong.
>
> > remember that GNU tar explicitly doesn't support _sed_ --- it supports
> something
> > like looks kind of like the sed 's' command.
>
> And that something is... really stupid.
>
> First of all, I verified that "rsh" remain the default even when one or
> more is
> specified, so the lower case ones only switch things back on when they've
> been
> disabled by an upper case letter. (So why even have the lower case ones? No
> idea! An individual pattern doesn't need to toggle them multiple times,
> and if
> the flags= one resets them each time you could do a blank flags=; to reset
> it
> back to normal by simply not SPECIFYING any of the upper case letters. Why
> did
> they do it the way they did it?)
>
> But it's stranger than that. Let's whip up a directory with a regular
> file, a
> symlink, and a hard link:
>
> $ mkdir uno
> $ ln -s tres uno/dos
> $ touch uno/tres
> $ ln uno/tres uno/quatro
> $ ls -l uno
> total 0
> lrwxrwxrwx 1 landley landley 4 Sep 30 02:34 dos -> tres
> -rw-r--r-- 2 landley landley 0 Sep 30 02:34 quatro
> -rw-r--r-- 2 landley landley 0 Sep 30 02:34 tres
>
> So what does "S" actually suppress?
>
> $ tar c uno --xform
> 's/uno/one/S;s/dos/two/S;s/tres/three/S;s/quatro/four/S' |
> tar tv
> drwxr-xr-x landley/landley 0 2022-09-30 02:34 one/
> -rw-r--r-- landley/landley 0 2022-09-30 02:34 one/four
> lrwxrwxrwx landley/landley 0 2022-09-30 02:34 one/two -> tres
> hrw-r--r-- landley/landley 0 2022-09-30 02:34 one/three link to
> one/four
>
> Answer: the symlink target, only. How about H?
>
> $ tar c uno --xform
> 's/uno/one/H;s/dos/two/H;s/tres/three/H;s/quatro/four/H' |
> tar tv
> drwxr-xr-x landley/landley 0 2022-09-30 02:34 one/
> -rw-r--r-- landley/landley 0 2022-09-30 02:34 one/four
> lrwxrwxrwx landley/landley 0 2022-09-30 02:34 one/two -> three
> hrw-r--r-- landley/landley 0 2022-09-30 02:34 one/three link to
> uno/quatro
>
> The hardlink target. How about R?
>
> $ tar c uno --xform
> 's/uno/one/R;s/dos/two/R;s/tres/three/R;s/quatro/four/R' |
> tar tv
> drwxr-xr-x landley/landley 0 2022-09-30 02:34 uno/
> -rw-r--r-- landley/landley 0 2022-09-30 02:34 uno/quatro
> lrwxrwxrwx landley/landley 0 2022-09-30 02:34 uno/dos -> three
> hrw-r--r-- landley/landley 0 2022-09-30 02:34 uno/tres link to one/four
>
> Now here's the insane part: we have just created a BROKEN HARDLINK. I
> didn't
> think that was an option, AND YET.
>
> Even though tres and quatro are equivalent hardlinks to each other (both
> dentries point to the same inode, you can't even _track_ which was created
> "first" because atime, ctime, and mtime are stored in the _inode_ that they
> _share_), the first one encountered is stored as a regular file and the
> subsequent ones are stored as a reference to that first regular file. And
> if
> you're doing a directory traversal, "first" is filesystem hash order.
> (You'll
> note I created tres first, but quatro was stored as the file and tres as
> the
> link, because ext4. Yes I could micromanage it by supplying filenames on
> the
> command line, and will probably have to if I want consistent tests.)
>
> Note: this isn't my implementation! This is all using the devuan host tar!
>
> What... what were they TRYING to do? What would happen if you tried to
> extract
> that tarball...
>
> tar: one/three: Cannot hard link to ‘uno/quatro’: No such file or
> directory
> tar: Exiting with failure status due to previous errors
>
> An error is what would happen. Which sort of implies that tar can
> arbitrarily
> hardlink to an existing host file? Which is just a WEALTH of warm fuzzies
> from a
> security standpoint...
>
> > The sad part is I strongly suspect I'm putting more thought into
> this than the
> > tar developers did, but I kind of need to know the answers to do it
> RIGHT.
> >
> > "any similarity to actual sed, living or dead, is entirely coincidental".
> >
> > tbh, i think all the choices were bad here:
>
> When the FSF and/or the gnu project are involved, definitely.
>
> > 1. invent a new format. now you have two problems.
>
> Can't vi do search and replace without explicitly being sed syntax?
>
not that i know of? 1,$s///g is the vi syntax, no? call it "ed" rather than
"sed" if you prefer, but it's basically the same, no?
> > 2. reuse sed entirely. now you have the problems you've been dealing
> with. (plus
> > a lot of curious "what does <thing> even mean?" questions, because sed
> is too
> > general for this use case.
>
> If you're regexing so your extract tries to overwrite /etc/passwd or
> something
> then allowing arbitrary inputs into the regex pattern is already
> verboten-ish.
>
> The gnu/dammit sed has a --sandbox mode that disables the r/w/e commands.
> What's
> "e"? Execute the pattern space as a shell command! Every invocation of sed
> can
> rm -rf your home directory! Because gnu!
>
> (I did not implement 'e' in toybox's sed, or in busybox's sed. I checked
> to see
> if they'd added it and as of August the answer is no... and I'm still
> listed as
> the maintainer at the top of that file. The last commit attributed to me
> was
> 2009. Backing away slowly...)
>
> > luckily i doubt we'll ever have to answer those
> > questions, because i doubt anything esoteric is likely to be used.
>
> Famous. Last. Words.
>
well, "you're doing this to yourself". GNU tar doesn't support any of this,
busybox tar even less. so it's only toybox tar where someone _could_ get
themselves into a mess with this in the first place. (and tbh, i still
haven't seen an actual motivation beyond "orthogonality". which is a fine
goal all other things being equal, but "massive added complexity", "ability
to construct tar commands that no-one can read [because hardly anyone knows
sed beyond s/// any more]", "interoperability issues with gnu/busybox", and
"possible unintended consequences" all sound like reasons to believe all
other things are _not_ equal here :-) )
> > despite
> > hyrum's law, i don't really know anyone except you who writes sed more
> complex
> > than a single s command, and issues with trying to run your sed programs
> on
> > macOS suggest that nothing beyond a single s command is portable to
> different
> > sed implementations anyway!)
>
> The mac sed implementation is crap, but then the mac kernel is kind of
> crap-ish.
>
mac sed is just BSD sed. it's good enough for anyone who's not as good at
sed as you, which is roughly "everyone" :-)
i see a lot more sophisticated use of od/hexdump/xxd from people than i do
sed!
> Rather a lot of mac tech is basically what would happen if Microsoft's
> design
> aesthetic could be competently implemented, and if you scratch the surface
> you
> get techies complaining about the backlight power pin being adjacent to the
> video signal pin so when you open and close the lid enough times to crease
> the
> ribbon cable your GPU fries instantly and can't be fixed without replacing
> the
> board. (I did not make that up:
>
> https://boards.rossmanngroup.com/threads/820-00875-no-backlight-no-display.60224/
> )
>
> > 3. rewrite that one command from sed. now you have duplication, and
> potential
> > skew between "actual s command" and "fake s command".
> >
> > option 3 is not obviously a bad choice given the issues with the
> alternatives,
> > but it's a bad fit for anyone trying to do option 2 instead.
>
> Implementing behavior is easy. Figuring out what the behavior should BE is
> hard.
>
tbh, this is where i like the (usual) rob landley toybox philosophy of
"i'll implement it when we have a motivating example of someone trying to
get something done with it, not just because it's mentioned in the docs". i
think that's a great pragmatic philosophy (in a world dominated by dogmatic
philosophies, to which group "orthogonality" -- for all its merits at times
-- tends to belong). it also has the nice side-effect of letting reality
guide where to spend your time, because it means you're focusing on things
that users demonstrably need rather than stuff that someone might want
someday.
with this tar sed stuff i feel like i'm watching a man drill holes in his
own head, all the time crying that it hurts :-)
(though you are, i think, collecting a hell of a lot of circumstantial
evidence that the original implementors didn't think this through. but to
me that says "so neither should you" --- just do the minimum, assume the
weird shit is as useless as it appears, move on with your life until/unless
someone comes along who actually does need more. a motivating example often
makes things clearer. the lack of one is often a sign you were right to
ignore the whole mess :-) )
> And turning it into proper tests cases... (If I don't say "v" then I just
> get
> one/three without "link to uno/quatro", but I add the v it gives me
> irrelevant
> user and timestamp info, although I guess I've already got test
> invocations that
> regularize all that....)
>
/me wonders how much of this gnu behavior is even deliberate versus
accidental, and thus likely to be the kind of test that suffers from debian
version skew if/when anyone actually tries to use the gnu version.
> Rob
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.landley.net/pipermail/toybox-landley.net/attachments/20220930/c444f19c/attachment-0001.htm>
More information about the Toybox
mailing list