[Toybox] Going through Oliver's xz commit list...
Rob Landley
rob at landley.net
Wed Jun 25 15:12:14 PDT 2025
On 6/24/25 17:58, Oliver Webb wrote:
> On Tuesday, June 24th, 2025 at 17:37, Rob Landley <rob at landley.net> wrote:
>> On Feb 23, 2024, Oliver Webb triaged the xz-embedded commit list since
>> our version forked off from the upstream public domain one, and highlighted:
>>
>>> commit 89094f05f02b Adds the ARM64 BCJ encoder
>>> commit 8f3ed8b1759a Adds a MicroLZMA decoder
>>> commit 6f0e0c41e368 Add xz_dec_catrun() to support concatenated files
>>
>>
>> I applied the first one (but don't have a test file for it, maybe
>> compressing an arm64 binary with debian's xz would exercise that
>> codepath), and am applying the third now,
>
> Assuming you've already applied the 3rd one,
I got distracted reading through the patch and trying to figure out why
they were doing it that way.
Locally when I read the patch description, I tested:
$ (xz <<<first; xz<<<second) | xzcat
first
second
And confirmed that worked on debian and didn't work with toybox xz, and
the obvious thing to do is just loop the extractor. (Finish means it got
a complete sequence, not that it ran out of data. Even the current patch
is running their new cat thing after running the existing unmodified
extractor, so an enclosing for loop at the caller seems like the way to
go? With an EOF test or a tweak to say "zero bytes of input is a unique
error return code"...)
So why did they make a function with a special kind of padding? I mean,
what's the POINT? And if they DO need padding (my tests didn't) then the
logical thing to do would be allow the extractor to skip an arbitrary
run of NUL bytes preceding the initial magic sequence... so why didn't
they just do that?
> did you experience any breaks in testing?
Didn't get that far. I'm disinclined to apply the patch as is until I
figure out what they didn't just do the obvious thing.
I added support in TAR (which appends runs of 512-byte NUL blocks, for
some reason the ancient posix spec recommended two of them and then gnu
adds a half-dozen, and I do not get why, but fine. So you skip runs of
NUL blocks when dealing with concatenented tarballs because there will
BE runs of NUL blocks...
And then when adding support for concatenated cpio files (because the
kernel cpio.gz extractor in initramfs supported it, and yeah I see the
use case), padding came up again (for reasons I don't remember) and that
was also runs of NUL bytes (I don't remember if there were alignment
constraints but I could check the code).
But this ISN'T runs of NUL bytes...? Or maybe it is and they're just
being clever about expressing it? Didn't read that far, got distracted...
> There doesn't seem to be any decompression errors, and test_xzcat passes, but I remember trying
> to add these upstream commits some time ago and running into errors:
>
> "I tried on 6f0ec4 (Add xz_dec_catrun), got it to compile, but not to
> pass the test suite or decode linux-kernel tarballs.|
Ooh, there IS a tests/xzcat.test. I should add stuff to that...
Thanks,
Rob
More information about the Toybox
mailing list