[Toybox] Tar --transform is weird.

Rob Landley rob at landley.net
Tue May 24 05:22:58 PDT 2022


On 5/23/22 19:45, enh wrote:
> https://android.googlesource.com/kernel/common/+/refs/heads/android-mainline/scripts/Makefile.package#49

That first one is using S (do not apply to symlinks), which is weird because it
LOOKS like it's doing git archive --prefix= and why wouldn't you do that to
symlinks too? (This never affected symlink targets, only the name of the symlink
itself...)

$ ln -s nine one/two/three/four/five/seven
$ man tar
$ tar -c one --transform 's:^:prefix/:S' | tar t
prefix/one/
prefix/one/two/
prefix/one/two/three/
prefix/one/two/three/four/
prefix/one/two/three/four/five/
prefix/one/two/three/four/five/seven
prefix/one/two/three/four/five/six

And the S appears to be a NOP on debian? Ah, wait:

$ tar -c one --transform 's:^:prefix/:' | tar tv
drwxr-xr-x landley/landley   0 2022-05-20 13:53 prefix/one/
drwxr-xr-x landley/landley   0 2022-05-20 13:53 prefix/one/two/
drwxr-xr-x landley/landley   0 2022-05-20 13:53 prefix/one/two/three/
drwxr-xr-x landley/landley   0 2022-05-20 13:53 prefix/one/two/three/four/
drwxr-xr-x landley/landley   0 2022-05-24 05:32 prefix/one/two/three/four/five/
lrwxrwxrwx landley/landley   0 2022-05-24 05:32
prefix/one/two/three/four/five/seven -> prefix/nine
-rw-r--r-- landley/landley   0 2022-05-20 13:53 prefix/one/two/three/four/five/six
$ tar -c one --transform 's:^:prefix/:S' | tar tv
drwxr-xr-x landley/landley   0 2022-05-20 13:53 prefix/one/
drwxr-xr-x landley/landley   0 2022-05-20 13:53 prefix/one/two/
drwxr-xr-x landley/landley   0 2022-05-20 13:53 prefix/one/two/three/
drwxr-xr-x landley/landley   0 2022-05-20 13:53 prefix/one/two/three/four/
drwxr-xr-x landley/landley   0 2022-05-24 05:32 prefix/one/two/three/four/five/
lrwxrwxrwx landley/landley   0 2022-05-24 05:32
prefix/one/two/three/four/five/seven -> nine
-rw-r--r-- landley/landley   0 2022-05-20 13:53 prefix/one/two/three/four/five/six

Ok, that's just horrible. And terribly documented. The S tells it not to affect
symlink TARGETS. Without which, it affects symlink targets. (Adding the tarball
prefix to a RELATIVE SYMLINK.)

Sigh. I have a bad cold right now and am not focusing well, but I need to pace
and stare at the ceiling a bit to figure out what to do about this once I've
recovered a bit.

> and https://android.googlesource.com/kernel/build/+/d68a8336a396a98820de2b3432ce5206fe70c854/build.sh#668

Looks like that one should already work fine?

> <https://android.googlesource.com/kernel/build/+/d68a8336a396a98820de2b3432ce5206fe70c854/build.sh#668>
> are the only two usages i've ever seen (and internal code search didn't have any
> others when i asked it).
> 
> sadly i've forgotten how to search debian packages again... but apparently i did
> remember last time this came
> up: http://lists.landley.net/htdig.cgi/toybox-landley.net/2020-October/012074.html

The link to the tar documentation page is 404 because of course it is.
archive.org to the rescue.

None of which are using the magic s///X tar flags, although they are using
nontrivial sed syntax:

libstatgen:
--transform 's,^,$(DIR_NAME)_$(VERSION)/,;s,$(WHOLEPACKAGE_MAKE),Makefile,'
pam-python:
--transform "s;^$${sd}\(/\|\$$\);$(RELEASE_ME)\1;"
conspy:
--transform "s;^$${sd}\(/\|\$$\);$(RELEASE_ME)\1;"

But three of them ARE using "flags=r;" at the start (your second search), which
IS THE DEFAULT BEHAVIOR. They are explicitly specifying the default behavior.

Alright, we already went in and added the s///x flag (commit 50d8ed89b1e0) which
assumes toybox's tar --transform requires toybox's sed, so if I double down on
that then I can pass -e "#TARHACK thistype" or something to sed so it has the
information to handle S flags? (Do --strip and --exclude ALSO apply here?)

I'm strongly tempted to make "s" _not_ be the default and have people need to
select "s" because even the gnu/manual goes:

> Using the expression `s,^,/usr/local/,' would mean adding `/usr/local' to both
> regular archive members and to link targets. In this case, `/lib/libc.so.6'
> would become:
> 
>   /usr/local/lib/libc.so.6 -> /usr/local/libc-2.3.2.so
>
> This is definitely not desired. To avoid this, the `S' flag is used, which
> excludes symbolic link targets from filename transformations. 

Most likely you want NOT applying to symlinks to BE THE DEFAULT and then use the
"s" flag to say "and do this to symlink targets". (Notice the subtlety: you can
have multiple s/// transforms semicolon separated, or just have multiple
--transform arguments, and the trailing flag is per-transform, so one of them
can apply to symlink targets only and the rest apply to filenames. Which says s
implies R but...

Sigh, I'm coming up with a sane API and what I need is compatibility with the
existing stupid, even though almost none of those debian transforms are
excluding symlink targets from their tarball transforms meaning subtle breakage
is likely...

This is not a well designed feature, but that's gnu for you.

Ok, what does the hardlink flag MEAN? The hardlink headers specify the previous
file in this tarball which this is a hardlink to. (Although technically there's
no requirement it be in this tarball, just "path to existing file, like a
symlink but type 1 instead of type 2".) So assuming this is once again "apply
transform to target path"...

Sigh, I can implement that. I do not remotely understand the intended use case,
but sure. (git archive added --prefix which was MUCH CLEANER than this. You more
or less have to special case the ^ match anyway. If path-to-symlink has a / in
it and then the target doesn't start with / you probably don't want to prefix it
because it's relative to wherever it is?)

Rob



More information about the Toybox mailing list