[Toybox] "bad xform" not very helpful

enh enh at google.com
Tue Sep 6 16:16:44 PDT 2022


and to the surprise of absolutely no-one ...

they got back to me with a test script to repro this with (which i assume
will work in any kernel tree, and isn't specific to ACK)...
```
#!/bin/bash

set -x

run () {
  start_time=$(date +%s)

  find /usr/include/linux/ -name *.h -print0 | \
      /usr/bin/perf record -g --               \
      tar czf target.$1.tar.gz                 \
      --absolute-names                         \
      --dereference                            \
      --transform "s,/usr/include/,,"          \
      --null -T -

  mv perf.data perf.$1.data

  end_time=$(date +%s)
  elapsed=$(( end_time - start_time ))
  echo $elapsed s
}
```

...and reported that all the time is going into tar fork()ing and exec()ing
sed :-(

so although NORECURSE was a clever hack to work around the _first_ tar+sed
problem, it seems to have had the expected result of landing me back with
the original tar+sed problem :-(

On Wed, Aug 31, 2022 at 2:55 PM enh <enh at google.com> wrote:

>
>
> On Tue, Aug 16, 2022 at 12:22 PM enh <enh at google.com> wrote:
>
>>
>>
>> On Tue, Aug 16, 2022 at 10:28 AM enh <enh at google.com> wrote:
>>
>>>
>>>
>>> On Tue, Aug 16, 2022 at 1:43 AM Rob Landley <rob at landley.net> wrote:
>>>
>>>> On 8/15/22 18:50, enh via Toybox wrote:
>>>> > and here's their minimized repro case:
>>>> >
>>>> > echo > /tmp/foo.txt; echo /tmp/foo.txt > /tmp/find.txt
>>>> >
>>>> > cat /tmp/find.txt | prebuilts/build-tools/path/linux-x86/tar czf
>>>> /tmp/out.tar.gz \
>>>> >   --absolute-names \
>>>> >   --transform 's,^/,,' -T -
>>>> >
>>>> > This fails with
>>>> >
>>>> > tar: bad xform
>>>>
>>>> Hmmm...
>>>>
>>>> $ echo > /tmp/foo.txt; echo /tmp/foo.txt > /tmp/find.txt
>>>> $ cat /tmp/find.txt | PATH=$PWD/sub9:$PATH ./tar czf out.tar.gz \
>>>>   --absolute-names --transform 's,^/,,' -T -
>>>> $ tar tvf /tmp/out.tar.gz
>>>> -rw-r--r-- landley/landley   1 2022-08-16 01:53 tmp/foo.txt
>>>>
>>>> Working for me? (The sub9 bit was because I stuck toybox sed in the
>>>> $PATH to
>>>> make sure that wasn't it...)
>>>>
>>>
>>> repos for me, both with their prebuilt but also with a fresh clone (on
>>> either macos or linux):
>>>
>>> */tmp/toybox$ *cat /tmp/find.txt | ./toybox tar czf /tmp/out.tar.gz   --absolute-names
>>>   --transform 's,^/,,' -T -
>>>
>>> tar: bad xform
>>>
>>> */tmp/toybox$ *
>>>
>>> a bit of printf debugging shows we're reading nothing back:
>>>
>>> */tmp/toybox$ *cat /tmp/find.txt | strace -f ./toybox tar czf
>>> /tmp/out.tar.gz   --absolute-names   --transform 's,^/,,' -T - 2>
>>> /tmp/out
>>>
>>> argv[0]="sed"
>>>
>>> argv[1]="-e"
>>>
>>> argv[2]="s,^/,,"
>>>
>>> pid=1779946
>>>
>>> stdin="/tmp/foo.txt"
>>>
>>>   len=0 Success
>>>
>>> total=0 result="(null)"
>>>
>>>
>>> but strace implies we're not actually exec()ing sed at all?
>>>
>>
>> and if i `CONFIG_TOYBOX_NORECURSE=y`, it calls sed and works...
>>
>
> ...though this might be about to come back and bite me. i'm hearing as-yet
> unconfirmed reports that toybox `tar czf` is a lot slower than gnu tar, and
> -- given that they're using --transform` while they're assuming it's tar or
> gzip, i'm wondering whether it's actually the fact that we're forking out
> to sed for every file?
>
> i've asked for repro steps or a `perf record` i can look at...
>
>
>>
>>
> let me know if you've already fixed this on your branch and that's why you
>>> can't repro, otherwise i'll keep looking after my meeting...
>>>
>>> > However, if the file names are fed via the -T /tmp/find.txt, it works:
>>>>
>>>> Hmmm... the child process shouldn't have access to the parent's stdin,
>>>> we
>>>> replaced it with a pipe? There was a potential bug in that area, but
>>>> commit
>>>> dc8b46d5ddab should have fixed it last month and I don't _think_ it
>>>> would have
>>>> applied here anyway...
>>>>
>>>> > echo > /tmp/foo.txt; echo /tmp/foo.txt > /tmp/find.txt
>>>> >
>>>> > prebuilts/build-tools/path/linux-x86/tar czf /tmp/out.tar.gz \
>>>> >   --absolute-names \
>>>> >   --transform 's,^/,,' -T /tmp/find.txt
>>>> >
>>>> > (the "prebuilts/build-tools/path/linux-x86/" stuff is just a
>>>> directory full of
>>>> > symlinks to toybox.)
>>>>
>>>> Multiplexer instead of standalone build shouldn't make a difference if
>>>> you've
>>>> disabled command recursion. (Modulo you're calling tar out a specific
>>>> path but
>>>> it then grabs sed out of the $PATH, but I haven't yet implemented the
>>>> extra
>>>> argument processing that would specifically require toybox sed...)
>>>>
>>>> (The extra error message is a little tricker than my first guess
>>>> because you can
>>>> have multiple --xform things which turn into a list of -e entries to
>>>> sed...
>>>> Possibly instead of error_exit() I should error_msg(), dump the sed
>>>> command line
>>>> on a second line, and then xexit()...)
>>>>
>>>> Rob
>>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.landley.net/pipermail/toybox-landley.net/attachments/20220906/13b8c766/attachment.htm>


More information about the Toybox mailing list