[Toybox] [PATCH] Clean up xz a good amount

Tue Feb 27 18:51:39 PST 2024

On Tuesday, February 27th, 2024 at 20:01, Rob Landley <rob at landley.net> wrote:
> On 2/27/24 19:23, Oliver Webb wrote:
>
> > The below patch does a bunch cleanup on xzcat.c.
> > It resolves the ifdefs that were in the code (But everything they were checking was defined so they didn't do anything)
> > Adds Tests for errors (And while we are changing the test suite, converts testing -> testcmd)
> > Rearange/Rewrite large comments to be smaller (and sometimes C99!).
> > It puts main at the bottom and do_xzcat above it, the way code is usually arranged..
> > Removes some function prototypes and the inline keyword (useless in modern C since it doesn't force anything)
> > Removes "!= 0" (And turns "x == 0" to "!x")
> > Some types replaced with their LP64 counterparts (uint8_t to char, etc)
> >
> > Yes, I know that this is large for a patch, but a lot of that is shuffling code/comments. None of the logic has been
> > changed. When all is said and done, this patch shrinks xzcat.c by over 10%
>
>
> Applied (out of the list's spam filter, "message body too big"), but you've
> introduced a glitch (missing flush maybe). When I tested it against the xz files
> I had lying around via "xzcat $FILE | tar tv", one file failed with your patch
> and succeeded before it. I copied it to my web server so you can take a look:
>
> wget https://landley.net/musl-cross-make.tar.xz
>
> (Yeah, it's 200 megs. Sorry about that. That was the first one that went "boing".)

$ ./xzcat musl-cross-make.tar.xz | ./count -l > /dev/null
229834752 bytes, 219Mb, 153Mb/s, 0m01s
$ xzcat musl-cross-make.tar.xz | ./count -l > /dev/null
229836800 bytes, 219Mb, 509Mb/s, 0m00s

Hmmm, there seems to be missing bytes at the end

$ ./xzcat musl-cross-make.tar.xz | xxd > fi
229834752 bytes, 219Mb, 153Mb/s, 0m01s
$ xzcat musl-cross-make.tar.xz | xxd > fi2
229836800 bytes, 219Mb, 509Mb/s, 0m00s14364673,14364800d14364672
$ diff fi fi2
< 0db30000: 2022 2469 220a 2020 646f 6e65 0a66 690a   "$i".  done.fi.
< 0db30010: 0000 0000 0000 0000 0000 0000 0000 0000  ................
[zeros until end of file]

putting a fflush at the end of reading a file doesn't fix anything, it's losing
exactly half a page so not flushing would also be my guess

> Also, I note that there is an "upstream" for this code, which may have changed
> since I merged this (many moons ago), and any upstream changes are probably
> worth a look:
>
> https://git.tukaani.org/xz-embedded.git
>
> The one in pending is an old version of that glued together into one file with a
> config header, NEWTOY() and command_main() wrapper. (Yes, the upstream is public
> domain! Extractor only, not compressor. I haven't found a public domain xz
> compressor yet.)
>
> I'm also unclear on the difference between xz and 7zip, and the linux kernel's
> "general setup menu" also offers lzma, lzo, lz4, and zstd as initramfs
> compression options. I just know kernel.org has tar.xz files, and
> https://linuxfromscratch.org/lfs/view/12.0/chapter03/packages.html has .gz,
> .bz2, and .xz tarballs, hence those are the three formats toybox can decompress.
>
> It should be able to create gz and zip, which is also what
> https://github.com/landley/toybox/tags offers to create. (Both of those are
> deflate, I know how and wrote one in java once long ago. I just got sidetracked
> on when to do dictionary resets to match the hashes of existing tarballs. The
> answer is probably "nuts to your white mice" and just do it every 250k or
> something, and oh well the hashes don't match but you get the data back and
> that's the important thing. But first I wanted to grab a bunch of existing .gz
> tarballs and CHECK when they do dictionary resets, because maybe it IS already
> something simple like that.)
>
> Rob
>
> P.S. Alas, I don't make a habit of extensively reading incompatibly licensed
> code before writing a 0BSD version, so this kind of question can linger a while
> on the todo list when I could theoretically just go track it down in the source...

-   Oliver Webb <aquahobbyist at proton.me>