[Toybox] Going through Oliver's xz commit list...

Wed Jun 25 15:12:14 PDT 2025

On 6/24/25 17:58, Oliver Webb wrote:
> On Tuesday, June 24th, 2025 at 17:37, Rob Landley <rob at landley.net> wrote:
>> On Feb 23, 2024, Oliver Webb triaged the xz-embedded commit list since
>> our version forked off from the upstream public domain one, and highlighted:
>>
>>> commit 89094f05f02b Adds the ARM64 BCJ encoder
>>> commit 8f3ed8b1759a Adds a MicroLZMA decoder
>>> commit 6f0e0c41e368 Add xz_dec_catrun() to support concatenated files
>>
>>
>> I applied the first one (but don't have a test file for it, maybe
>> compressing an arm64 binary with debian's xz would exercise that
>> codepath), and am applying the third now,
> 
> Assuming you've already applied the 3rd one,

I got distracted reading through the patch and trying to figure out why 
they were doing it that way.

Locally when I read the patch description, I tested:

$ (xz <<<first; xz<<<second) | xzcat
first
second

And confirmed that worked on debian and didn't work with toybox xz, and 
the obvious thing to do is just loop the extractor. (Finish means it got 
a complete sequence, not that it ran out of data. Even the current patch 
is running their new cat thing after running the existing unmodified 
extractor, so an enclosing for loop at the caller seems like the way to 
go? With an EOF test or a tweak to say "zero bytes of input is a unique 
error return code"...)

So why did they make a function with a special kind of padding? I mean, 
what's the POINT? And if they DO need padding (my tests didn't) then the 
logical thing to do would be allow the extractor to skip an arbitrary 
run of NUL bytes preceding the initial magic sequence... so why didn't 
they just do that?

> did you experience any breaks in testing?

Didn't get that far. I'm disinclined to apply the patch as is until I 
figure out what they didn't just do the obvious thing.

I added support in TAR (which appends runs of 512-byte NUL blocks, for 
some reason the ancient posix spec recommended two of them and then gnu 
adds a half-dozen, and I do not get why, but fine. So you skip runs of 
NUL blocks when dealing with concatenented tarballs because there will 
BE runs of NUL blocks...

And then when adding support for concatenated cpio files (because the 
kernel cpio.gz extractor in initramfs supported it, and yeah I see the 
use case), padding came up again (for reasons I don't remember) and that 
was also runs of NUL bytes (I don't remember if there were alignment 
constraints but I could check the code).

But this ISN'T runs of NUL bytes...? Or maybe it is and they're just 
being clever about expressing it? Didn't read that far, got distracted...

> There doesn't seem to be any decompression errors, and test_xzcat passes, but I remember trying
> to add these upstream commits some time ago and running into errors:
> 
> "I tried on 6f0ec4 (Add xz_dec_catrun), got it to compile, but not to
> pass the test suite or decode linux-kernel tarballs.|

Ooh, there IS a tests/xzcat.test. I should add stuff to that...

Thanks,

Rob