[Toybox] [PATCH] tar: add -I (--use-compress-program) support.

Rob Landley rob at landley.net
Tue Oct 20 17:08:25 PDT 2020



On 10/20/20 3:04 PM, enh wrote:
> On Tue, Aug 25, 2020 at 1:20 AM Rob Landley <rob at landley.net> wrote:
>>
>> On 8/24/20 12:15 PM, enh via Toybox wrote:
>>> unfortunately, -I was just the easy request... they also want --transform.
>>>
>>> this is the specific command:
>>>
>>> tar -I $(KGZIP) -c $(RCS_TAR_IGNORE) -f $(2).tar.gz \
>>>     --transform 's:^:$(2)/:S' $(TAR_CONTENT) $(3); \
>>>
>>> https://android.googlesource.com/kernel/common/+/refs/heads/android-mainline/scripts/Makefile.package#49
>>
>> Eh, this is just some variant of:
>>
>> int fds[2] = {-1, -1}, len;
>> pid_t pid = xpopen_both((char *[]){"sed", "-e", arg, 0}, fds);
>>
>> if (pid<0) sadness();
>> for (;;) {
>>   char *line = 0;
>>
>>   len = 0
>>   dprintf(*fds, "%s\n", filename);
>>   len = readline(&line, *len);
>> }
>>
>>> it's not clear to me just how complete that "sed expression" is. oh, looks like
>>> it's just 's': https://www.gnu.org/software/tar/manual/tar.html#SEC115 (but with
>>> some weird extra flags rRsShH)
>>
>> Except for that bit, where it's using flags not explained in the manual. I don't
>> have "info" installed, is this available on the web somewhere?
>>
>> https://www.gnu.org/software/sed/manual/sed.html#The-_0022s_0022-Command
>>
>> Does not mention s///S because of course not.
>>
>> sed s///H is multi-line mode? Is there any way I can continue to _not_ care
>> about that for a little while longer? Hmmm... \S is a regex extension to match
>> non-whitespace chars but there's no \ here...
>>
>>   $ echo hello | sed -e 's/x/y/S'
>>   sed: -e expression #1, char 7: unknown option to `s'
>>
>> And the gnu/dammit sed in debian doesn't understand trailing S either. Huh. Is
>> it a tar --transform thing? Nothing in the man page about that...
>>
>> Ok, it IS a tar thing (not a sed thing) which is NOT described in the tar man
>> page but IS described in the tar online manual but NOT in the part about
>> --transform but instead in the "modifying file and member names" part and then
>> hit page down 4 times.
> 
> yeah, for those following along at home:
> https://www.gnu.org/software/tar/manual/html_section/tar_51.html
> 
>> Wheee.
>>
>> Do I police that the sed pattern is s/x/x/ or just let them pass in y/x/z and
>> friends if they really want to? (Hmmm, probably police it given they can w to
>> create new files otherwise. So start with s%c and then require the %c twice more
>> and no ; after the last %c. No ispunct at all, really...
>>
>> ok, gix#rRsShH are the supported flags. Currently gi# are normal sed commands
>> (along with iI). We don't yet support x and that seems like basic sed should
>> have that but it's not in the sed manual. (I can feed -E to sed when we see x
>> but controlling that for a given s/// search pattern seems kinda useful. Oh well...
>>
>> rRsShH are just "apply to regular, symlink, or hardlink" target filters. Easy
>> enough to put in the
> 
> i don't think this works given the fact that you can have multiple s
> commands (though that's almost unused in debian:

Collate them by type, launch multiple background processes in parallel and feed
the name in question through the right sed instance(s) for the type?

(I can't teach toybox sed about the new letters because sed is operating on
filename strings, not files. So it needs to filter up front.)
> i personally only need 's:^:prefix/:S' but the vast majority of debian
> uses also seem to be just one s/pattern/replacement/flags (
> https://codesearch.debian.net/search?q=tar.*--transform.*&literal=0 ),
> i assume that's a good enough place for me to start? (it wasn't clear
> to me whether when you said "no ; after the last %c" you meant
> "because i don't think we should support multiple s commands unless
> proven necessary" or "because i'm not aware that GNU tar supports
> multiple s commands".)

Wasn't aware it supports multiple.

> btw, strace shows that GNU tar doesn't spawn a sed process for this.
> but if we're only doing the simplest case, i think we can get by with
> the pipe.

This sort of thing is why xpopen_both() exists, and sed has a -z option:

  int fd[2] = {-1, -1};
  FILE *fp;
  char *line = 0;
  ssize_t len;

  if (1>(xpopen_both((char *[]){"sed", "-ze", pattern, 0}, fd))
    perror_exit("sed");
  fp = fdopen(fd[1], "r");
  dprintf(fd[0], "potato", 7);
  len = getdelim(&line, (void *)&len, 0, fp);
  printf("line=%s\n", line);
  free(line);
  fclose(fp);
  close(fd[0]);

And while we're at it, there's no obvious reason to restrict the sed pattern
other than possible security implications? I can see y// being useful, for
example, especially if I go back and make it utf8 aware like I did for toysh
[x-y] wildcards and so on. From a security standpoint, --transform with extract
already lets you write to /etc/password because the resulting filename can be
anything, so it's only COMPRESS side that has to be checked for "w" commands...?
(Or is the security exploit not wanting to let you r arbitrary files and include
them in the resulting _filename_? Either way, arbitrary input into --transform
already seems a bit inadvisable...)

Ah, because "n" would hang. Lots of things in sed aren't guaranteed to produce a
line of output, but s// by itself is. (And y//.)

Hmmm, ok now I'm rethinking whether I should just implement a new s// in tar,
because it's... mostly just a regex match? Except for & and \1 and such. Ok,
maybe I should factor out some of the sed plumbing into lib and have tar share
it? Hmmm...

Right, either way: out of time to work on this today.

Rob



More information about the Toybox mailing list