[Toybox] I hate the GNU design aesthetic.

Rob Landley rob at landley.net
Fri Sep 30 01:44:03 PDT 2022


On 9/28/22 10:41, enh wrote:
> On Tue, Sep 27, 2022 at 11:32 PM Rob Landley <rob at landley.net> wrote:
>     No, "flags=" is not how sed command syntax normally works, thanks for asking.

Sigh.

>     So now I'm starting at this going:
> 
>     1) Is rsh only the default when there are no scope flags? As in if I tell it
>     scope s should that switch off r and h?
> 
>     2) Does flags= interact with the block logic? If I do
> 
>       --transform '{flags=S;s/potato/blah/;};s/a/b/'
> 
>     should the second transform apply to symlink type?
> 
>     3) If how about jumps? If I b to a :label past a flags= statement does that
>     prevent the flag change from taking effect? (Is this a parse time attribute or a
>     runtime attribute?)

I'm defining that as a runtime attribute, so the individual s/// command's RSH
get masked against the default RSH which is "flags=rsh" when nobody said flags=
because resolving that at parse time just feels wrong.

> remember that GNU tar explicitly doesn't support _sed_ --- it supports something
> like looks kind of like the sed 's' command.

And that something is... really stupid.

First of all, I verified that "rsh" remain the default even when one or more is
specified, so the lower case ones only switch things back on when they've been
disabled by an upper case letter. (So why even have the lower case ones? No
idea! An individual pattern doesn't need to toggle them multiple times, and if
the flags= one resets them each time you could do a blank flags=; to reset it
back to normal by simply not SPECIFYING any of the upper case letters. Why did
they do it the way they did it?)

But it's stranger than that. Let's whip up a directory with a regular file, a
symlink, and a hard link:

  $ mkdir uno
  $ ln -s tres uno/dos
  $ touch uno/tres
  $ ln uno/tres uno/quatro
  $ ls -l uno
  total 0
  lrwxrwxrwx 1 landley landley 4 Sep 30 02:34 dos -> tres
  -rw-r--r-- 2 landley landley 0 Sep 30 02:34 quatro
  -rw-r--r-- 2 landley landley 0 Sep 30 02:34 tres

So what does "S" actually suppress?

  $ tar c uno --xform 's/uno/one/S;s/dos/two/S;s/tres/three/S;s/quatro/four/S' |
tar tv
  drwxr-xr-x landley/landley   0 2022-09-30 02:34 one/
  -rw-r--r-- landley/landley   0 2022-09-30 02:34 one/four
  lrwxrwxrwx landley/landley   0 2022-09-30 02:34 one/two -> tres
  hrw-r--r-- landley/landley   0 2022-09-30 02:34 one/three link to one/four

Answer: the symlink target, only. How about H?

  $ tar c uno --xform 's/uno/one/H;s/dos/two/H;s/tres/three/H;s/quatro/four/H' |
tar tv
  drwxr-xr-x landley/landley   0 2022-09-30 02:34 one/
  -rw-r--r-- landley/landley   0 2022-09-30 02:34 one/four
  lrwxrwxrwx landley/landley   0 2022-09-30 02:34 one/two -> three
  hrw-r--r-- landley/landley   0 2022-09-30 02:34 one/three link to uno/quatro

The hardlink target. How about R?

  $ tar c uno --xform 's/uno/one/R;s/dos/two/R;s/tres/three/R;s/quatro/four/R' |
tar tv
  drwxr-xr-x landley/landley   0 2022-09-30 02:34 uno/
  -rw-r--r-- landley/landley   0 2022-09-30 02:34 uno/quatro
  lrwxrwxrwx landley/landley   0 2022-09-30 02:34 uno/dos -> three
  hrw-r--r-- landley/landley   0 2022-09-30 02:34 uno/tres link to one/four

Now here's the insane part: we have just created a BROKEN HARDLINK. I didn't
think that was an option, AND YET.

Even though tres and quatro are equivalent hardlinks to each other (both
dentries point to the same inode, you can't even _track_ which was created
"first" because atime, ctime, and mtime are stored in the _inode_ that they
_share_), the first one encountered is stored as a regular file and the
subsequent ones are stored as a reference to that first regular file. And if
you're doing a directory traversal, "first" is filesystem hash order. (You'll
note I created tres first, but quatro was stored as the file and tres as the
link, because ext4. Yes I could micromanage it by supplying filenames on the
command line, and will probably have to if I want consistent tests.)

Note: this isn't my implementation! This is all using the devuan host tar!

What... what were they TRYING to do? What would happen if you tried to extract
that tarball...

  tar: one/three: Cannot hard link to ‘uno/quatro’: No such file or directory
  tar: Exiting with failure status due to previous errors

An error is what would happen. Which sort of implies that tar can arbitrarily
hardlink to an existing host file? Which is just a WEALTH of warm fuzzies from a
security standpoint...

>     The sad part is I strongly suspect I'm putting more thought into this than the
>     tar developers did, but I kind of need to know the answers to do it RIGHT.
> 
> "any similarity to actual sed, living or dead, is entirely coincidental".
> 
> tbh, i think all the choices were bad here:

When the FSF and/or the gnu project are involved, definitely.

> 1. invent a new format. now you have two problems.

Can't vi do search and replace without explicitly being sed syntax?

> 2. reuse sed entirely. now you have the problems you've been dealing with. (plus
> a lot of curious "what does <thing> even mean?" questions, because sed is too
> general for this use case.

If you're regexing so your extract tries to overwrite /etc/passwd or something
then allowing arbitrary inputs into the regex pattern is already verboten-ish.

The gnu/dammit sed has a --sandbox mode that disables the r/w/e commands. What's
"e"? Execute the pattern space as a shell command! Every invocation of sed can
rm -rf your home directory! Because gnu!

(I did not implement 'e' in toybox's sed, or in busybox's sed. I checked to see
if they'd added it and as of August the answer is no... and I'm still listed as
the maintainer at the top of that file. The last commit attributed to me was
2009. Backing away slowly...)

> luckily i doubt we'll ever have to answer those
> questions, because i doubt anything esoteric is likely to be used.

Famous. Last. Words.

> despite
> hyrum's law, i don't really know anyone except you who writes sed more complex
> than a single s command, and issues with trying to run your sed programs on
> macOS suggest that nothing beyond a single s command is portable to different
> sed implementations anyway!)

The mac sed implementation is crap, but then the mac kernel is kind of crap-ish.
Rather a lot of mac tech is basically what would happen if Microsoft's design
aesthetic could be competently implemented, and if you scratch the surface you
get techies complaining about the backlight power pin being adjacent to the
video signal pin so when you open and close the lid enough times to crease the
ribbon cable your GPU fries instantly and can't be fixed without replacing the
board. (I did not make that up:
https://boards.rossmanngroup.com/threads/820-00875-no-backlight-no-display.60224/)

> 3. rewrite that one command from sed. now you have duplication, and potential
> skew between "actual s command" and "fake s command".
> 
> option 3 is not obviously a bad choice given the issues with the alternatives,
> but it's a bad fit for anyone trying to do option 2 instead.

Implementing behavior is easy. Figuring out what the behavior should BE is hard.

And turning it into proper tests cases... (If I don't say "v" then I just get
one/three without "link to uno/quatro", but I add the v it gives me irrelevant
user and timestamp info, although I guess I've already got test invocations that
regularize all that....)

Rob


More information about the Toybox mailing list