[Toybox] tar xf doesn't work on compressed files if the *cat tools aren't available

Rob Landley rob at landley.net
Fri Aug 26 23:35:49 PDT 2022


On 8/25/22 19:49, enh via Toybox wrote:
> the kernel folks build in an even more restricted environment than AOSP. so
> although they have bzip2 and gzip, they don't have bzcat and gzcat.
> 
> from a bad run (with no bzcat/gzcat):

It's supposed to fall back to bzip -d and gzip -d if zcat etc fail?

void tar_main(void)
{
  char *s, **args = toys.optargs,
    *archiver = FLAG(I) ? TT.I : (FLAG(z) ? "gzip" : (FLAG(J) ? "xz":"bzip2"));
...
      struct string_list *zcat = FLAG(I) ? 0 : find_in_path(getenv("PATH"),
        FLAG(j) ? "bzcat" : FLAG(J) ? "xzcat" : "zcat");

      // Toybox provides more decompressors than compressors, so try them first
      TT.pid = xpopen_both(zcat ? (char *[]){zcat->str, 0} :
        (char *[]){archiver, "-d", 0}, pipefd);

It searches the path for bzcat/xzcat/zcat and uses that if it can find it,
otherwise falls back to bzip2/xz/gzip -d if find_in_path() returned null...

> ~/aosp-master-with-phones$ /usr/bin/strace -f -e execve
> ./prebuilts/build-tools/linux-x86/bin/toybox tar tvf x.tar.gz
> execve("./prebuilts/build-tools/linux-x86/bin/toybox",
> ["./prebuilts/build-tools/linux-x8"..., "tar", "tvf", "x.tar.gz"],
> 0x7ffe8acad930 /* 36 vars */) = 0
> /usr/bin/strace: Process 640688 attached
> [pid 640688]
> execve("/usr/local/google/home/enh/aosp-master-with-phones/prebuilts/build-tools/path/linux-x86/bzip2",
> ["bzip2", "-d"], 0x7ffd675c2a80 /* 36 vars */) = 0
> bzip2: (stdin) is not a bzip2 file.

That's misidentifying the file type: x.tar.gz is gzip -d not bzip2 -d.

Ah, I see! I added automatic file type detection AFTER implementing the fallback
logic, and never went back to test that detection and fallback work together.
(Because it was fiddly to make the test suite do that: _removing_ stuff from the
$PATH for a test is awkward. Although in this case the detection logic prevents
toybox from recursing for the builtin, so it's not that hard here. Test added.)

Anyway, the problem is that char *archiver is set based on the command line
flags (with bzip2 being the "should never be used" default if none are set) and
then detection happens later it doesn't get reset.

Which is awkward because archiver is used for both reading and writing, and
autodetection can only happen on reading, so if I move it I have to have it
happen twice? Eh, put it in a function called directly from both places and yank
the variable...

I only added a test for gzip -d and not the other two. Someday, I need to
properly fluff out the test suite to cover ALL the corners, but not going down
that rabbit hole now...

> there is code to try to use $TOOL -d if the *cat tool doesn't exist (that's why
> we're running bzip2 at all), but it's obviously not working here :-(

Worked for flags set from command line, but not for autodetect.

> it also works fine if you explicitly say 'z'.

No "also", that was the difference. :)

> i didn't get time today to look at this in more detail, but i do notice that
> there appears to be an inconsistency between the two ways we choose the
> appropriate tool:
> 
>     *archiver = FLAG(I) ? TT.I : (FLAG(z) ? "gzip" : (FLAG(J) ? "xz":"bzip2"));
> 
>       struct string_list *zcat = FLAG(I) ? 0 : find_in_path(getenv("PATH"),
>         FLAG(j) ? "bzcat" : FLAG(J) ? "xzcat" : "zcat");
> 
> is that intentional or a bug?

The ordering is arbitrary, but both should work?

I added support for the archivers in the same order support was historically
added to tar, and the first test sort of reflects that (although when I added xz
support I tested the new -J flag instead of -j so that's flipped). The signature
checks for autodetection are also in "historical" order. The string_list checks
are alphabetical by result string. Both orders should provide the same results
when at least one flag is set, and that second set of tests is inside:

    if (FLAG(j)||FLAG(z)||FLAG(I)||FLAG(J)) {

So ONE of them has to be set. FLAG(I) forces it to use TT.I (hence FLAG(I) ? 0
in the second test to force the archiver path, because -I NAME gets "-d" when
decompressing), and then we test 2 of the 3 remaining flags and fall back to the
third. The order's different, but both test that J ? xz, then one does z ? gz :
bz, the other does j ? bz : gz.

The first test never left it blank so always sets _something_ (hence bzip2 if no
flags set because I didn't have an extra test at the end and... null I guess?),
but both compression and decompression consumers are inside if (stuff) tests
that require one of them to be set to use the result. (Albeit "set by
autodetect" counts.)

That said, I commited a patch to put them all consistently in historical order
so nobody ELSE has to ask this in future. (Compressors and decompressors sort
differently in alphabetical order, so that would still be either arbitrary or
inconsistent.)

> (i'll try to have another look tomorrow, but since you seem to be nocturnal atm
> i thought i'd send this out now anyway :-) )

I plead the third. (It's the one about quartering troops.)

Rob

P.S. I should have a TODO annotation to check that the compiler is smart enough
to figure out that if (FLAG(A)||FLAG(B)||FLAG(C)...) is the same as if
(toys.optflags&(FLAG_A|FLAG_B|FLAG_C)) and just do the one test against a
consolidated mask. I'm ASSUMING it's smart enough since the optimizer devs who
keep <strike>inventing crimes</a> adding undefined behavior have been working on
this for 50 years now, but I have a nagging doubt that "boolean vs logical
comparison" might derail them...)


More information about the Toybox mailing list